TeenyTinyLlama: open-source, tiny, and totally Brazilian

Hugging FacePreprintPaperDemo

TeenyTinyLlama is a lovingly engineered family of tiny-but-mighty open-source language models trained natively in Brazilian Portuguese. Designed for researchers, tinkerers, and anyone exploring LLMs beyond the English-centric world, TeenyTinyLlama shows how far small models can go when optimized for a specific linguistic ecosystem.

Built entirely on open tooling, the project charts the whole journey of developing foundation models for low-resource languages—from tokenizer training to large-scale pre-training, evaluation, and fine-tuning—while keeping the entire pipeline transparent, reproducible, and beautifully lightweight. Despite their size, these models pack a surprising punch, offering a controlled research testbed for studying multilinguality, bias, hallucinations, and efficiency constraints in language modeling.