Deployment

Deploymennt LLM⚑

Cheat sheet - Time and effort lifecycle

Reduce the size of the model in deployment⚑

Pruning⚑

Deep model pruning involves identifying and removing unnecessary connections, weights, or even entire neurons from a trained deep learning model. By eliminating these redundant components, the model can become more compact, faster, and more memory-efficient, while still maintaining a high level of accuracy.

Distilling⚑

The key idea of distilling step-by-step is to extract informative natural language rationales (i.e., intermediate reasoning steps) from LLMs, which can in turn be used to train small models in a more data-efficient way.

PEFT⚑

TBD

Resources⚑

Last update: 2024-10-23
Created: 2024-10-23