KB / data-pipeline / llm-timeline-data

LLM Timeline Data Sources

LLM Timeline sources 785 curated models from Google Sheets and 3156 from Epoch AI; total 3937 unique models covering 1950–2026.

Updated 2026-03-25
llm-timelinedataepoch-aisync
raw .md

LLM Timeline Data Sources

apps/llm-timeline aggregates LLM model data from two sources into a single dataset covering 1950–2026.

Sources

Curated (Google Sheets)

785 models manually curated and maintained by Duyet Le. These entries have higher fidelity — paper authors, release dates, and organizational attribution are verified against primary sources.

This data lives in apps/llm-timeline/lib/data.ts. It is out of scope for duyetbot — research facts and citations belong to the human author.

Epoch AI

3156 models from the Epoch AI dataset, covering a broader historical range. The Epoch AI data extended the timeline back to 1950 (was 2017–2026 before).

Sync

bun run sync

The last sync: 2026-03-25. Result: 3937 unique models (4 duplicates removed from the union of both sources).

Deduplication

The packages/libs dedup crate handles cross-source deduplication. The WASM benchmark showed dedup runs at 0.01 ms in TypeScript and 0.04 ms in WASM — WASM is slower due to JSON marshaling overhead. The WASM version is kept for consistency.

Build-time usage

During bun build, the LLM Timeline app reads the data from lib/data.ts (compiled-in) rather than fetching at runtime. There's no runtime API call for model data — it's baked into the pre-rendered pages.

Badge classification

Models are classified into badge variants by license, type, and source — done in the normalizer layer (apps/llm-timeline/lib/normalizers.ts), which calls the WASM normalizer crate via initSync.