Deployment
Deploymennt LLMâš‘
Reduce the size of the model in deploymentâš‘
Pruningâš‘
Deep model pruning involves identifying and removing unnecessary connections, weights, or even entire neurons from a trained deep learning model. By eliminating these redundant components, the model can become more compact, faster, and more memory-efficient, while still maintaining a high level of accuracy.
Distillingâš‘
The key idea of distilling step-by-step is to extract informative natural language rationales (i.e., intermediate reasoning steps) from LLMs, which can in turn be used to train small models in a more data-efficient way.
PEFTâš‘
TBD
Resourcesâš‘
- How to Forget Jenny's Phone Number or: Model Pruning, Distillation, and Quantization
- Distilling step-by-step
Last update: 2024-10-23
Created: 2024-10-23
Created: 2024-10-23