Training CodeParrot 🦜 from Scratch

Hugging Face Blog · 2021-12-08

Open source

Related items

AIHugging Face Blog2025-06-04

KV Cache from scratch in nanoVLM

AIHugging Face Blog2020-02-14

How to train a new language model from scratch using Transformers and Tokenizers

AIHugging Face Blog2024-01-02

LoRA training scripts of the world, unite!

AIHugging Face Blog2026-02-03

Training Design for Text-to-Image Models: Lessons from Ablations

AIHugging Face Blog2025-09-23

Smol2Operator: Post-Training GUI Agents for Computer Use

AIarXiv cs.AI2026-05-26

Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders

Model internals encode rich information about how a large language model (LLM) processes its training data; however, post-training data engineering largely relies on external signals and ignores rich intrinsic signals lying in model internals. We propose SAERL, a data engineering framework for LLM reinforcement lear...