Context Window Management

Overview

A long context window is not a free resource. Attention follows a U-shaped curve, strong at the start and end, weak in the middle (“lost in the middle”). Adding more context often hurts quality past a certain point. Good prompt design treats the window as a scarce attention budget, not a dumping ground.

Key ideas

Lost in the middle, Liu et al. (2023) showed retrieval accuracy drops sharply for items placed in the middle of a long context. Put the load-bearing content at the front or end of a long block.
Context rot, Chroma’s research found degradation at every context-length increment, not just near the cap. Even well under the limit, more tokens = worse attention.
Compression and summarization, Instead of carrying every tool result verbatim, summarize older state and keep only decisions, open questions, and task markers.
Retrieval over stuffing, Past a corpus size, RAG beats long-context stuffing for accuracy and cost. See RAG.
Ordering matters, System prompt first (cache-friendly), then high-signal reference, then the task, with examples closest to the task.

Context Window Management

Overview

Key ideas

References