Skip to content

Deployment

Deploymennt LLMâš‘

Cheat sheet - Time and effort lifecycle

Reduce the size of the model in deploymentâš‘

Pruningâš‘

Deep model pruning involves identifying and removing unnecessary connections, weights, or even entire neurons from a trained deep learning model. By eliminating these redundant components, the model can become more compact, faster, and more memory-efficient, while still maintaining a high level of accuracy.

Distillingâš‘

The key idea of distilling step-by-step is to extract informative natural language rationales (i.e., intermediate reasoning steps) from LLMs, which can in turn be used to train small models in a more data-efficient way.

PEFTâš‘

TBD

Resourcesâš‘


Last update: 2024-02-14
Created: 2024-02-07