Kairos: Temporality in LLM

Temporal Knowledge, LLMs, ICML 2026

Paper accepted at ICML 2026.

Resources: Paper     Dataset     Checkpoints     Code     Blog

Abstract

Large language models (LLMs) are pre-trained on web data spanning years, yet the temporal dimension of this data—when documents were written and how knowledge evolves over time—is largely ignored during training. This paper investigates how data temporality impacts LLM pre-training and knowledge representation.

We introduce KairosQA, a benchmark designed to evaluate how well LLMs capture and update factual knowledge over time. We further train Sequential Helium 6B, a model pre-trained with an explicit awareness of temporal ordering in the data, and show that temporal structure in training improves the model’s ability to represent and recall time-sensitive knowledge.

Our findings highlight that naively mixing data from different time periods leads to temporal confusion in LLMs, and that training curricula sensitive to the temporal distribution of documents significantly improve model calibration on time-varying facts.


KairosQA Dataset

KairosQA is a question-answering dataset specifically designed to probe temporal knowledge in LLMs. It tests whether models can correctly answer factual questions tied to a specific point in time, distinguish between knowledge that changes over time and stable facts, and reflect the state of the world at the time of their training cutoff.

KairosQA construction pipeline: Wikidata triples with temporal annotations are used to synthesise time-aware multiple-choice questions, evaluated under both cloze (ranking) and free-text generation protocols.

The dataset is available on HuggingFace.


Results

We evaluate Sequential Helium 6B against several strong baselines on KairosQA. While most models show a steep accuracy drop for recent facts near their training cutoff, Sequential Helium maintains more stable performance over time.

KairosQA cloze accuracy across time for different models. Sequential Helium (Ours) maintains strong and stable accuracy on recent facts compared to Gemma3 4B, Gemma4 4B, and Qwen3 8B.

Model checkpoints for Sequential Helium 6B are available on HuggingFace.


Authors

Hippolyte Pilchen, Romain Fabre, Franck Signe Talla, Patrick Pérez, and Edouard Grave.

Feel free to reach out if you have any questions — contact details are provided in the paper.