Webinar: 3x Faster Inference with Turbo LoRA

September 24 from 10am - 11am PT

GenAI teams working with small open-source models often struggle with both model quality and throughput. Traditional fine-tuning methods have addressed the problem of quality by making it possible to achieve GPT-4 level accuracy with small LLMs. However, these techniques neglect model throughput.

Introducing Turbo LoRA, a new parameter-efficient fine-tuning method that increases throughput by 2-3x while simultaneously achieving high model accuracy. With this new approach, you get the best of both worlds: lower latency, reduced inference costs, and higher accuracy, all in a single adapter that can be created with one line of code.

Join us to learn:

Key challenges that teams face with open-source LLMs
How to increase throughput and accuracy with fine-tuning
Look under the hood of Turbo LoRA incl. spec decoding
Accuracy and throughput benchmarks pre/post fine-tuning
How to get started on your own

Featured Speakers:

Arnav Garg

ML Engineering Team Lead, Predibase

https://www.linkedin.com/in/arnavgrg/

Live Webinar
3x Faster Inference with Turbo LoRA

Save your spot

Live Webinar3x Faster Inference with Turbo LoRA

Save your spot

Live Webinar
3x Faster Inference with Turbo LoRA