How to speed up the response time of large models

Using vllm :right_arrow: https://www.youtube.com/watch?v=McLdlg5Gc9s
Simplifying models through quantization, pruning, distillation, and binarization :right_arrow: https://www.youtube.com/watch?v=jW2cmZ-9hLk