September 24 from 10am - 11am PT
GenAI teams working with small open-source models often struggle with both model quality and throughput. Traditional fine-tuning methods have addressed the problem of quality by making it possible to achieve GPT-4 level accuracy with small LLMs. However, these techniques neglect model throughput.
Introducing Turbo LoRA, a new parameter-efficient fine-tuning method that increases throughput by 2-3x while simultaneously achieving high model accuracy. With this new approach, you get the best of both worlds: lower latency, reduced inference costs, and higher accuracy, all in a single adapter that can be created with one line of code.
Join us to learn:
- Key challenges that teams face with open-source LLMs
- How to increase throughput and accuracy with fine-tuning
- Look under the hood of Turbo LoRA incl. spec decoding
- Accuracy and throughput benchmarks pre/post fine-tuning
- How to get started on your own
Featured Speakers:
Arnav Garg ML Engineering Team Lead, Predibase |