From Looking Glass Universe.
The difference between this video and the last GPT I made is that that one was a toy model. It was trained on a very small set of data (the works of Shakespeare) and it doesn’t get very good. This one is much bigger, and reaches the level of GPT-2. It required quite a lot more optimisation to train it.
This is Karpathy’s tutorial: https://youtu.be/l8pRSuU81PU?si=IZKoAl-YpqEn9Tyj
The code: https://github.com/karpathy/build-nanogpt/blob/master/train_gpt2.py
My write up as I went: https://colab.research.google.com/drive/1awhFM8oIGMVTQII-S7rsQGFeJDDxFoEV?usp=sharing