The Necessity of Compression
The Necessity of Compression
The world model is not merely convenient—it is constitutively necessary. This follows from a fundamental asymmetry between the world and any bounded system embedded within it.
The information bottleneck makes this precise.
Let be the world state space with effective dimensionality , and let be a bounded system with finite computational capacity . Then:
where is the system’s internal representation. The world model necessarily inhabits a state space smaller than the world.
The world contains effectively unbounded degrees of freedom: every particle, field configuration, and their interactions across all scales. Any physical system has finite matter, energy, and spatial extent, hence finite information-carrying capacity. The system cannot represent the world at full resolution; it must compress. This is not a limitation to be overcome but a constitutive feature of being a bounded entity in an unbounded world.
□The compression ratio of a world model captures how aggressively this simplification operates:
where is the subspace of world states that affect the system’s viability. The compression ratio characterizes how much the system must discard to exist. And this has a profound implication: compression determines ontology. What a system can perceive, respond to, and value is determined by what survives compression. The world model’s structure—which distinctions it maintains, which it collapses—constitutes the system’s effective ontology.
The information bottleneck principle formalizes this: the optimal representation maximizes information about viability-relevant outcomes while minimizing complexity:
The Lagrange multiplier controls the compression-fidelity tradeoff. Different values yield different creatures: high produces simple organisms with coarse world models; low produces complex organisms with rich representations.
The world model is not a luxury or optimization strategy. It is what it means to be a bounded system in an unbounded world. The compression ratio is not a parameter to be minimized but a constitutive feature of finite existence. What survives compression determines what the system is.
This has a precise architectural consequence that the experiments will confirm (Part VII, V22–V27). A linear prediction head compresses hidden state to output through a single weight matrix — and a single matrix is always decomposable into independent columns, each serving a separate target dimension. The compression creates a factored ontology: the system's internal states are channeled into independent streams with no pressure to coordinate. Replace the linear map with a two-layer architecture, and the compression changes: the chain rule through two weight matrices means every hidden dimension's gradient depends on every other dimension's activation at the intermediate layer. The compression now demands coordination. What survives it is not a collection of independent features but a coupled representation — an ontology where the parts cannot be understood without the whole. Compression does not merely determine what the system perceives. It determines whether the system's internal states are unified or factored.