Scaling Laws and Chinchilla
In this post I explain what I learned from the Google Deepmind scaling laws paper “Training Compute-Optimal Large Language Models”.
Setup
The intention of the paper is to find out how to train the most effective language model given a budget. Instead of measuring the budget in dollars, they measure it in floating point operations (FLOPs). FLOPs is the natural unit for a problem like this since dollars per FLOP tends to decrease so much over time.