Tiny Language Model
Creating a small-scale language model to understand the fundamentals of large language models (LLMs) and their training processes.
View on GitHubOverview
I always had the urge to understand the underlying principles of many technologies. Large Language Models (LLMs) are no exception. These models have revolutionized natural language processing. However, their complexity and size often make them seem like a black box. To demystify LLMs for myself, I decided to create a tiny language model from scratch.
In the repository above, you can find the crucial parts of the project. However, as it was primarily a learning exercise, not all parts are fully polished or documented. There have been many experiments and iterations, and the code reflects that exploratory nature. That is why you might find some parts less structured or maybe even missing, as I created the repository mainly to share the core results of my learning journey.
Results
For a quick look at the results, you can check out the example outputs below. The model was trained on a dataset of coding problems, which has shown to be quite effective for training tiny language models, as the text often follows specific patterns and structures. Take a look at the problem generated by the model:
Given an array of integers where every integer appears randomly exists a single
integer in the array, except for one. Write a function
`findSingleUnique(missing_element)` that finds how many times a single integer
can be formed is missing and all other integers that are divisible by either
the integer or all of its positive.
Examples:
find_single([1, 2, 3, 6, 5], 3) == [4, 6, 7]) == 4, 7
find_majority_duplicate([1, 2, 4, 3, 4]) == None
find_missing_integer([7, 7, 1) == 7
find_integer([7, 8, 2, 8]) == 9
find_majority([1]) == 3
Project Details
The model has a transformer-based architecture, which is the backbone of many modern LLMs. The implementation was inspired by some repositories attempting to create tiny language models, such as this one. I also drew some inspiration from the DeepSeek model architecture. Key ideas on building the model have been taken from none other than the legendary Attention is All You Need paper. Also a great contribution to my learning journey was the amazing Andrej Karpathy's Youtube series on building neural networks.
Technologies Used
- Python
- PyTorch
- NumPy
- Jupyter Notebooks
- HuggingFace (datasets)