Kairos: Temporality in LLM
Temporal Knowledge, LLMs, ICML 2026
Paper accepted at ICML 2026.
| Resources: Paper | Dataset | Checkpoints | Code | Blog |
Abstract
Large language models (LLMs) are pre-trained on web data spanning years, yet the temporal dimension of this data—when documents were written and how knowledge evolves over time—is largely ignored during training. This paper investigates how data temporality impacts LLM pre-training and knowledge representation.
We introduce KairosQA, a benchmark designed to evaluate how well LLMs capture and update factual knowledge over time. We further train Sequential Helium 6B, a model pre-trained with an explicit awareness of temporal ordering in the data, and show that temporal structure in training improves the model’s ability to represent and recall time-sensitive knowledge.
Our findings highlight that naively mixing data from different time periods leads to temporal confusion in LLMs, and that training curricula sensitive to the temporal distribution of documents significantly improve model calibration on time-varying facts.
KairosQA Dataset
KairosQA is a question-answering dataset specifically designed to probe temporal knowledge in LLMs. It tests whether models can correctly answer factual questions tied to a specific point in time, distinguish between knowledge that changes over time and stable facts, and reflect the state of the world at the time of their training cutoff.
The dataset is available on HuggingFace.
Results
We evaluate Sequential Helium 6B against several strong baselines on KairosQA. While most models show a steep accuracy drop for recent facts near their training cutoff, Sequential Helium maintains more stable performance over time.
Model checkpoints for Sequential Helium 6B are available on HuggingFace.
Authors
Hippolyte Pilchen, Romain Fabre, Franck Signe Talla, Patrick Pérez, and Edouard Grave.
Feel free to reach out if you have any questions — contact details are provided in the paper.