Recorded 15 October 2024. Sam Smith of Google DeepMind presents "How to train an LLM" at IPAM's Theory and Practice of Deep Learning Workshop.
Abstract: Drawing on the experience of designing and scaling Griffin ( arxiv.org/abs/2402.19427) and RecurrentGemma, I will introduce some of the key practical concepts behind training large language models. Likely to include: a brief introduction to Transformers, including why MLPs, not Attention, usually dominate computation. A simple mental model of the computational bottlenecks on TPUs and GPUs. How to train models too large to fit in memory on a single device. Scaling laws and hyper-parameter tuning. A detailed discussion of LLM inference. If time permits, I will discuss how to design recurrent models competitive with transformers, their advantages and drawbacks.
Learn more online at: ipam.ucla.edu/programs/workshops/workshop-ii-theory-and-practice-of-deep-learning/?tab=overview
- Sam Smith - How to train an LLM - IPAM at UCLA ( Download)