Sam Smith How To Train An Llm Ipam At Ucla

Recorded 15 October 2024. Sam Smith of Google DeepMind presents "How to train an LLM" at IPAM's Theory and Practice of Deep Learning Workshop.
Abstract: Drawing on the experience of designing and scaling Griffin ( arxiv.org/abs/2402.19427) and RecurrentGemma, I will introduce some of the key practical concepts behind training large language models. Likely to include: a brief introduction to Transformers, including why MLPs, not Attention, usually dominate computation. A simple mental model of the computational bottlenecks on TPUs and GPUs. How to train models too large to fit in memory on a single device. Scaling laws and hyper-parameter tuning. A detailed discussion of LLM inference. If time permits, I will discuss how to design recurrent models competitive with transformers, their advantages and drawbacks.
Learn more online at: ipam.ucla.edu/programs/workshops/workshop-ii-theory-and-practice-of-deep-learning/?tab=overview