Martin Lédl

Overview

I always had the urge to understand the underlying principles of many technologies. Large Language Models (LLMs) are no exception. These models have revolutionized natural language processing. However, their complexity and size often make them seem like a black box. To demystify LLMs for myself, I decided to create a tiny language model from scratch.

In the repository above, you can find the crucial parts of the project. However, as it was primarily a learning exercise, not all parts are fully polished or documented. There have been many experiments and iterations, and the code reflects that exploratory nature. That is why you might find some parts less structured or maybe even missing, as I created the repository mainly to share the core results of my learning journey.

Results

For a quick look at the results, you can check out the example outputs below. The model was trained on a dataset of coding problems, which has shown to be quite effective for training tiny language models, as the text often follows specific patterns and structures. Take a look at the problem generated by the model:

Given an array of integers where every integer appears randomly exists a single
integer in the array, except for one. Write a function 
`findSingleUnique(missing_element)` that finds how many times a single integer
can be formed is missing and all other integers that are divisible by either 
the integer or all of its positive.

Examples:
find_single([1, 2, 3, 6, 5], 3) == [4, 6, 7]) == 4, 7
find_majority_duplicate([1, 2, 4, 3, 4]) == None
find_missing_integer([7, 7, 1) == 7
find_integer([7, 8, 2, 8]) == 9
find_majority([1]) == 3

Project Details

The model has a transformer-based architecture, which is the backbone of many modern LLMs. The implementation was inspired by some repositories attempting to create tiny language models, such as this one. I also drew some inspiration from the DeepSeek model architecture. Key ideas on building the model have been taken from none other than the legendary Attention is All You Need paper. Also a great contribution to my learning journey was the amazing Andrej Karpathy's Youtube series on building neural networks.

Technologies Used

Python
PyTorch
NumPy
Jupyter Notebooks
HuggingFace (datasets)

Tiny Language Model

Overview

Results

Project Details

Technologies Used