Tiny Language Classifier
Deep dive into diffusion models by building one from the ground up.
View on GitHubOverview
The Tiny Language Classifier project was born out of my curiosity to test out Naive Bayes classifiers for text classification tasks. Language identification is a fundamental problem in natural language processing, and I wanted to explore how well a simple model like Naive Bayes could perform on this task.
This is nothing BIG, but I feel like using overly complex models for simple tasks is a common pitfall in machine learning these days.
Project Details
The model was trained and tested on a language identification dataset available on Hugging Face. The text data is then vectorized and fed into a Naive Bayes (MultinomialNB) classifier implemented using Scikit-Learn.
Even with its simplicity, the model achieved impressive over 90% accuracy on the test set.
Technologies Used
- Python
- Scikit-Learn
- NumPy
- Pandas
- Hugging Face (datasets)