logo-predibase

Live Webinar
3x Faster Inference with Turbo LoRA

Higher throughput, lower inference cost and better accuracy—all in one single fine-tuned adapter.

September 24 from 10am - 11am PT

GenAI teams working with small open-source models often struggle with both model quality and throughput. Traditional fine-tuning methods have addressed the problem of quality by making it possible to achieve GPT-4 level accuracy with small LLMs. However, these techniques neglect model throughput.

Introducing Turbo LoRA, a new parameter-efficient fine-tuning method that increases throughput by 2-3x while simultaneously achieving high model accuracy. With this new approach, you get the best of both worlds: lower latency, reduced inference costs, and higher accuracy, all in a single adapter that can be created with one line of code.

Join us to learn:
  • Key challenges that teams face with open-source LLMs
  • How to increase throughput and accuracy with fine-tuning
  • Look under the hood of Turbo LoRA incl. spec decoding
  • Accuracy and throughput benchmarks pre/post fine-tuning
  • How to get started on your own

Featured Speakers:

Speaker 10 (1)

Arnav Garg

ML Engineering Team Lead, Predibase

https://www.linkedin.com/in/arnavgrg/

 

Save your spot