Blog Logo
TAGS

nanoGPT: The Easiest and Fastest Way to Train/Finetune Medium-sized GPTs

nanoGPT is a user-friendly and efficient repository for training/finetuning medium-sized GPTs. The codebase, a rewrite of minGPT, is simple and easily hackable, with boilerplate training loops and GPT model definitions. Currently, the file train.py is capable of reproducing GPT-2 (124M) on OpenWebText on a single 8XA100 40GB node in about 4 days of training. nanoGPT offers a range of possibilities for users with varying degrees of computational resources, enabling the training of new models from scratch, and the finetuning of pretrained GPT-2 checkpoints. Requirements for installation are minimal, including PyTorch, NumPy, transformers, datasets, TikTok, and TQDM. Resources for training include training_shakespeare_char.py and a single 1MB file. Sample generated by nanoGPT includes character-level models after just 3 minutes of GPU training, demonstrating its impressive efficiency.