Train Your Own LLM – Tutorial

Estimated read time 2 min read

Post Content

​ This course is designed to help beginners learn how to train a language model from start to finish. Imad will guide you through the whole process, using Moroccan Darija as an example.

In this course, you will learn:

– How to load text data
– How to train a tokenizer from scratch using the Byte Pair Encoding (BPE) method
– How to use the tokenizer to encode text data
– How the Transformer architecture works in language models
– How to pre-train a model
– How to create a supervised fine-tuning dataset
– How to fine-tune the model and build an AI assistant that you can chat with

You can find the slides, notebook, and scripts in this GitHub repository:
https://github.com/ImadSaddik/Train_Your_Language_Model_Course

The supervised fine-tuning dataset is available here:
https://github.com/ImadSaddik/BoDmaghDataset
https://huggingface.co/datasets/ImadSaddik/BoDmaghDataset

The tokenizers trained on AtlaSet can be found here:
https://github.com/ImadSaddik/DarijaTokenizers

You can access the AtlaSet on HuggingFace here:
https://huggingface.co/datasets/atlasia/Atlaset

To connect with Imad Saddik, check out his social accounts:
– LinkedIn: https://www.linkedin.com/in/imadsaddik/
– YouTube: https://www.youtube.com/@3CodeCampers
– Discord: imad_saddik

❤️ Support for this channel comes from our friends at Scrimba – the coding platform that’s reinvented interactive learning: https://scrimba.com/freecodecamp

⭐️ Course Contents ⭐️
(0:00:00) About the Course
(0:03:03) Introduction
(0:07:24) Training Data
(0:15:33) Tokenization
(0:29:00) The Transformer Architecture
(0:52:21) Pre-training
(1:24:46) Fine-tuning Dataset
(1:33:05) Instruction Fine-tuning
(2:06:17) Fine-tuning with LoRA
(2:20:39) Let’s Scale Everything
(3:09:40) Bonus
(3:27:10) Conclusion

🎉 Thanks to our Champion and Sponsor supporters:
👾 Drake Milly
👾 Ulises Moralez
👾 Goddard Tan
👾 David MG
👾 Matthew Springman
👾 Claudio
👾 Oscar R.
👾 jedi-or-sith
👾 Nattira Maneerat
👾 Justin Hual

Learn to code for free and get a developer job: https://www.freecodecamp.org

Read hundreds of articles on programming: https://freecodecamp.org/news   Read More freeCodeCamp.org 

#programming #freecodecamp #learn #learncode #learncoding

You May Also Like

More From Author