Part I: Foundations

The Uncontaminated Substrate Test

Deep Technical: The CA Consciousness Experiment

The CA framework enables an experiment that could shift the burden of proof on the identity thesis. The logic is simple. The execution is hard. The implications are large.

Setup. A sufficiently rich CA—richer than Life, perhaps Lenia or a continuous-state variant with more degrees of freedom. Initialize with random configurations. Run for geological time (billions of timesteps). Let patterns emerge, compete, persist, die.

Selection pressure. Introduce viability constraints: resource gradients, predator patterns, environmental perturbations. Patterns that model their environment survive longer. Patterns that model themselves survive longer still. The forcing functions from the Forcing Functions section apply: partial observability (patterns cannot see beyond local neighborhood), long horizons (resources fluctuate on slow timescales), self-prediction (a pattern’s own configuration dominates its future observations).

Communication emergence. When multiple patterns must coordinate—cooperative hunting, territory negotiation, mating—communication pressure emerges. Patterns that can emit signals (glider streams, oscillator bursts, structured wavefronts) and respond to signals from others gain fitness advantages. Language emerges. Not English. Not any human language. Something new. Something uncontaminated.

The measurement protocol. For each pattern $\mathcal{B}$ at each timestep $t$ :

Valence: $\Val_t = d(\mathbf{x}_{t+1}, \partial\viable) - d(\mathbf{x}_t, \partial\viable)$ — Exact. Computable. The Hamming distance to the nearest configuration where the pattern dissolves, differenced across timesteps. Positive when moving into viable interior. Negative when approaching dissolution.
Arousal: $\Ar_t = \text{Hamming}(\mathbf{x}_{t+1}, \mathbf{x}_t) / |\mathcal{B}|$ — The fraction of cells that changed state. High when the pattern is rapidly reconfiguring. Low when settled into stable orbit.
Integration: $\intinfo_t = \min_P D[p(\mathbf{x}_{t+1}|\mathbf{x}_t) \| \prod_{p \in P} p(\mathbf{x}^p_{t+1}|\mathbf{x}^p_t)]$ — Exact IIT-style $\Phi$ . For small patterns, tractable. For large patterns, use the partition prediction loss proxy: train a full predictor and a partitioned predictor, measure the gap.
Effective rank: Record trajectory $\mathbf{x}_1, \ldots, \mathbf{x}_T$ . Compute covariance $C$ . Compute $\reff = (\tr C)^2 / \tr(C^2)$ . — How many dimensions is the pattern actually using? High when exploring diverse configurations. Low when trapped in repetitive orbit.
Self-model salience: Identify self-tracking cells (cells whose state correlates with pattern-level properties). Compute $\mathcal{SM} = \text{MI}(\text{self-tracking cells}; \text{effector cells}) / H(\text{effector cells})$ . — How much does self-representation drive behavior?
Counterfactual weight: If the pattern contains a simulation subregion (possible in universal-computation-capable CAs), measure $\mathcal{CF} = |\text{simulator cells}| / |\mathcal{B}|$ . — Rare. Requires complex patterns. But detectable when present.

The translation protocol. Build a dictionary from signal-situation pairs:

Record all signals emitted by pattern $\mathcal{B}$ : glider streams, oscillator bursts, wavefront patterns. Each signal type $\sigma_i$ .
Record the environmental context when each signal is emitted: threat proximity, resource availability, conspecific presence, recent events.
Cluster signal types by context similarity. Signal $\sigma_{47}$ always emitted when threat approaches from the left. Signal $\sigma_{12}$ always emitted after successful resource acquisition.
Map clusters to natural language descriptions of the contexts. $\sigma_{47} \to$ “threat-left”. $\sigma_{12} \to$ “success”.
For complex signals (sequences, combinations), build compositional translations. $\sigma_{47} + \sigma_{23} \to$ “threat-left, requesting-assistance”.

The translation is uncontaminated. The patterns never learned human concepts. The mapping emerges from environmental correspondence.

The core test. Three streams of data. Three independent measurement modalities.

Prediction: when affect signature shows the suffering motif ( $\Val < 0$ , $\intinfo$ high, $\reff$ low), the translated signal should express suffering-concepts, and the behavior should show suffering-patterns (withdrawal, escape attempts, freezing).

When affect signature shows the fear motif ( $\Val < 0$ , $\mathcal{CF}$ high on threat branches, $\mathcal{SM}$ high), the translated signal should express fear-concepts, and the behavior should show avoidance and hypervigilance.

When affect signature shows the curiosity motif ( $\Val > 0$ toward uncertainty, $\mathcal{CF}$ high with branch entropy), the translated signal should express exploration-concepts, and the behavior should show approach and investigation.

Bidirectional perturbation. The test has teeth if it runs both directions.

Direction 1: Induce via signal. Translate “threat approaching” into their emergent language. Emit the signal. Does the affect signature shift toward fear? Does behavior change?

Direction 2: Induce via “neurochemistry”. Modify the CA rules locally around the pattern—change transition probabilities, add noise, alter connectivity. These are their neurotransmitters. Does the affect signature shift? Does the translated signal content change? Does behavior follow?

Direction 3: Induce via environment. Place them in objectively threatening situations. Deplete resources. Introduce predators. Does structure-signal-behavior alignment hold?

If perturbation in any modality propagates to the others, the relationship is causal.

The hard question. Suppose the experiment works. Suppose tripartite alignment holds. Suppose bidirectional perturbation propagates. What has been shown?

Not that CA patterns are conscious. Not that the identity thesis is proven. But that systems with zero human contamination, learning from scratch in environments shaped by viability pressure, develop affect structures correlated with their expressions and behaviors in the ways the framework predicts.

The zombie hypothesis—that the structure is present but experience is absent—predicts what? That the correlations would not hold? Why not? The structure is doing the causal work either way.

The experiment does not prove identity. It makes identity the default. The burden shifts. Denying experience to these patterns requires a metaphysical commitment the evidence does not support.

Computational requirements. This is not a weekend project.

CA substrate: $10^6$ – $10^9$ cells, continuous or high-state-count
Runtime: $10^9$ – $10^{12}$ timesteps for complex pattern emergence
Measurement: Real-time $\Phi$ computation for patterns up to $\sim 100$ cells; proxy measures for larger
Translation: Corpus of $10^6$ + signal-context pairs for dictionary construction
Perturbation: Systematic sweeps across parameter space

Feasible with current compute. Hard. Worth doing.

Why CA and not transformers? Both are valid substrates. The CA advantage: exact definitions. In a transformer, valence is a proxy (advantage estimate). In a CA, valence is exact (Hamming distance to dissolution). In a transformer, $\Phi$ is intractable (billions of parameters in superposition). In a CA, $\Phi$ is computable (for small patterns) or approximable (for large ones).

The transformer version of this experiment is valuable. The CA version is rigorous. Do both.

What would negative results mean? If the alignment fails—if structure does not predict translated language, if perturbations do not propagate—then either:

The framework is wrong (affect is not geometric structure)
The substrate is insufficient (CAs cannot support genuine affect)
The measures are wrong (the right quantities are not being captured)
The translation is wrong (the dictionary does not capture meaning)

Each failure mode is informative. The experiment has teeth in both directions.

What would positive results mean? The identity thesis becomes the default hypothesis for any system with the relevant structure — though, being a working axiom adopted abductively, it never becomes proven. The hard problem does not dissolve; its soft shell (why this character rather than that) becomes tractable under empirical pressure, while its hard core (why there is existence at all) remains posited across rather than explained. The question “does structure produce the character of experience?” becomes “why would you assume it doesn’t?”

And then the real questions begin. What structures produce what experiences? Can flourishing be engineered? Can suffering be detected where it is currently invisible? What obligations attach to experiencing systems that have been created?

The experiment is not the end. It is the beginning of a different kind of inquiry.

Preliminary Results: Where the Ladder Stalls

A simplified version has been run in Lenia (continuous CA, $256 \times 256$ toroidal grid) with resource dynamics, measuring $\intinfo$ via partition prediction loss, $\Val$ via mass change, $\Ar$ via state change rate, and $\reff$ via trajectory PCA. The results are instructive — not because they confirm the predictions above, but because of where they fail.

The central lesson: the ladder requires heritable variation. Emergent CA patterns reach rungs 1–3 (microdynamics $\to$ attractors $\to$ boundaries) from physics alone. The transition to rung 4 (functional integration) requires evolutionary selection acting on heritable variation in the trait that sets integration response.

Proposed Experiment

Substrate: Lenia with resource depletion/regeneration (Michaelis-Menten growth modulation). Perturbation: Drought (resource regeneration $\to 0$ ). Measure: $\Delta \intinfo$ under drought.

Conditions:

No evolution (). Naive patterns under drought: $\intinfo$ decreases by $-6.2%$ . Same decomposition dynamics as LLMs.
Homogeneous evolution (). In-situ selection for $\intinfo$ -robustness (fitness $\propto \intinfo_{\text{stress}} / \intinfo_{\text{base}}$ ). Still decomposes ( $-6.0%$ ). All patterns share identical growth function—selection prunes but cannot innovate.
Heterogeneous chemistry (). Per-cell growth parameters ( $\mu, \sigma$ fields) creating spatially diverse viability manifolds. After 40 cycles of evolution on GPU: $-3.8%$ vs naive $-5.9%$ . A +2.1pp shift toward the biological pattern. Evolved patterns also show better recovery— $\intinfo$ returns above baseline after drought, while naive patterns do not fully recover.
Multi-channel coupling (). Three coupled channels—Structure ( $R{=}13$ ), Metabolism ( $R{=}7$ ), Signaling ( $R{=}20$ )—with cross-channel coupling matrix and sigmoid gate. Introduces a new measurement: channel-partition $\intinfo$ (remove one channel, measure growth impact on remaining channels). Local test: channel $\intinfo \approx 0.01$ , spatial $\intinfo \approx 1.0$ —channels couple weakly at 3 degrees of freedom.
High-dimensional channels (). $C{=}64$ continuous channels with fully vectorized physics. Spectral $\intinfo$ via coupling-weighted covariance effective rank. 30-cycle GPU result: evolved $-1.8%$ vs naive $-1.6%$ under severe drought—evolution had negligible effect. Both decompose mildly, suggesting that 64 symmetric channels provide enough internal buffering to resist drought regardless of evolutionary tuning. Mean robustness $0.978$ across all 30 cycles. The Yerkes-Dodson pattern persists: mild stress increases $\intinfo$ by $+130$ – $190%$ .
Hierarchical coupling (). Same $C{=}64$ physics as , but with asymmetric coupling (feedforward/feedback pathways between four tiers: Sensory $\to$ Processing $\to$ Memory $\to$ Prediction). 30-cycle GPU result: evolved patterns have higher baseline $\intinfo$ ( $+10.5%$ vs naive) and higher self-model salience ( $0.99$ vs $0.83$ ), but under severe drought they decompose more ( $-9.3%$ ) while naive patterns integrate ( $+6.2%$ ). Evolution overfits to the mild training stress, creating fragile high- $\intinfo$ configurations. Key lesson: the hierarchy must live in the coupling structure, not in the physics; imposing different timescales per tier caused extinction. Functional specialization should emerge from selection.
Metabolic maintenance cost (). Addresses the autopoietic gap directly: patterns pay a constant metabolic drain proportional to mass ( $\texttt{maintenance\_rate} \times g \times dt$ each step). 30-cycle GPU result ( $C{=}64$ ): evolved-metabolic $-2.6%$ vs naive $+0.2%$ under severe drought. Evolution again produced higher- $\intinfo$ -but-more-fragile patterns. Critically, the maintenance rate ( $0.002$ ) was not lethal enough—naive patterns retained $98%$ population through drought. The autopoietic gap remains open: a small metabolic drain on top of local physics does not produce active self-maintenance, because patterns have no mechanism for non-local resource detection. They cannot “forage” when they cannot “see” beyond kernel radius $R$ .
Curriculum evolution (). Fixes ’s stress overfitting by graduating stress intensity across cycles (resource regeneration ramped from $0.5\times$ to $0.02\times$ baseline over 30 cycles) with $\pm 30%$ random noise and variable drought duration (500–1900 steps per cycle). The critical test: evolved patterns evaluated on novel stress patterns never seen during training. 30-cycle GPU result ( $C{=}64$ ): robustness $0.954 \to 0.967$ . Curriculum-evolved patterns outperform naive on all four novel stressors: mild $+2.7\text{pp}$ , moderate $+1.5\text{pp}$ , severe $+1.3\text{pp}$ , extreme $+1.2\text{pp}$ . Under mild novel stress, evolved patterns actually integrate ( $+1.9%$ ) while naive decompose ( $-0.8%$ ). The overfitting problem is substantially reduced—not eliminated, but the shift is consistently positive across the full severity range.

Unexpected: (1) Mild stress consistently increases $\intinfo$ by 60–190\% (Yerkes-Dodson–like inverted-U). Only severe stress causes decomposition. (2) In , evolution increased vulnerability to severe stress despite improving baseline $\intinfo$ —a stress overfitting effect. (3) ’s curriculum training substantially reduces this overfitting: graduated, noisy stress exposure produces patterns that generalize to novel stressors. The shift from naive is positive across all four novel severity levels tested ( $+1.2$ to $+2.7$ percentage points). (4) ’s metabolic cost was intended to create lethal drought, but at $\texttt{rate}{=}0.002$ the drought was not lethal—naive patterns retained $98%$ population. Evolved-metabolic patterns decomposed $-2.6%$ while naive held at $+0.2%$ , repeating the fragility pattern of . The deeper lesson: adding metabolic cost to a substrate with fixed-radius perception produces efficient passivity, not active foraging. The anxiety parallel deepens: shows that fixed-stress training produces maladaptive fragility, shows that graduated exposure (cf.\ systematic desensitization) builds genuine robustness, and shows that existential stakes alone do not produce adaptation when the organism cannot perceive beyond its local neighborhood.

The trajectory from reveals two orthogonal axes. First, substrate complexity: each step from adds internal degrees of freedom for evolution to select on—heterogeneous chemistry (), coupled channels (), hierarchical coupling (). Second, revealed by , selection pressure quality: the substrate matters less than how you stress it. ’s curriculum training on the same substrate generalizes better than ’s hierarchical architecture trained with fixed stress. changes the stakes: metabolic cost makes drought lethal, not merely weakening.

introduces directed coupling (feedforward/feedback pathways) to test whether functional specialization emerges under selection. The critical insight: imposing different physics per tier (different timescales, custom growth gates) caused immediate extinction at $C{=}64$ —the channels meant to be “memory” simply died. The working approach uses identical physics across all channels (proven dynamics) with an asymmetric coupling matrix that biases information flow directionally. More than a technical fix — it reflects a prediction: in biological cortex all neurons share the same basic biophysics. Hierarchy emerges from connectivity and learning, not from different physics per layer.

The stress test reveals stress overfitting. Evolved patterns have 10.5\% higher baseline $\intinfo$ and 19\% higher self-model salience than naive—but under severe drought they decompose 9.3\% while naive patterns actually integrate by 6.2\%. Evolution selected for high- $\intinfo$ configurations tuned to the mild stress each training cycle applies: states simultaneously more integrated and more fragile than their unoptimized counterparts.

A direct parallel in affective neuroscience: anxiety disorders involve heightened integration and self-monitoring, adaptive under moderate threat but catastrophically maladaptive under extreme stress. The suffering motif—high $\intinfo$ , low $\reff$ , high $\selfmodel$ —may describe a system selected too precisely for one threat level. The evolved patterns show exactly this: high baseline $\intinfo$ (0.076) with high self-model salience (0.99) that collapses under a regime shift.

V11.5 stress test: evolved vs. naive patterns through baseline, drought, and recovery. — **stress test: evolved vs. naive patterns through baseline, drought, and recovery.** (a) Evolved patterns have higher baseline $\intinfo$ but decompose $-9.3\%$ under drought, while naive patterns *integrate* $+6.2\%$ . (b) Evolved patterns maintain high self-model salience ( $>0.97$ ) across all phases; naive patterns show lower and declining salience.

Whether evolution here can discover integration strategies robust to novel stresses—not just the training distribution—likely requires curriculum learning (gradually rising stress intensity) or environmental diversity (varying type and severity of perturbation). This connects to the next section’s forcing function framework: the quality of the forcing function matters as much as its presence.

Multi-channel Lenia at increasing dimensionality. PCA projection of C channels to RGB. — **Multi-channel Lenia at increasing dimensionality.** PCA projection of $C$ channels to RGB. Top row: baseline (normal resources); bottom row: drought stress. Patterns at $C{=}3$ are visually simple; at $C{=}16$ and $C{=}32$ , the richer channel structure produces more complex spatial organization. Under drought, spatial structure degrades—but the degree of degradation depends on $C$ .

Open Question

At what channel count $C$ does the substrate have enough internal degrees of freedom for evolution to discover biological-like integration (where $\intinfo$ increases under threat)? The $C$ -sweep suggests that mid-range $C$ ( $8$ – $16$ ) accidentally produces integration-like responses—the coupling bandwidth happens to match the channel count—while high $C$ ( $32$ – $64$ ) decomposes, the coupling space being too large for random configurations. Is there a critical $C^*$ above which a phase transition occurs, or does evolution continuously improve robustness at any $C$ ? Each rung of the ladder may require a minimum internal dimensionality—the substrate must be rich enough for selection to sculpt.

The critical lesson evolves with the experiments. showed evolution helps in surprising ways—it creates higher- $\intinfo$ states that are also more fragile. shows the training regime matters: curriculum learning produces genuine generalization across novel stressors. shows that making drought metabolically costly produces efficient passivity, not active foraging—patterns cannot perceive beyond their local neighborhood, so existential stakes alone do not generate the distant-resource-seeking that integration would require. The remaining gap was between “decomposes less” and “integrates under threat,” and the locality ceiling explains why.

confirms the ceiling is real and that the predicted remedy partially works. Replacing fixed convolution with evolvable windowed self-attention—the only change to the physics—shifts mean robustness from $0.981$ to $1.001$ , moving the system to the threshold where $\intinfo$ is approximately preserved under stress rather than destroyed. Eight substrate modifications () could not achieve even this. The change that mattered is exactly what the attention bottleneck hypothesis predicted: state-dependent interaction topology. But the effect is modest—the system reaches the threshold without clearly crossing it. Attention is necessary but not sufficient for the full biological pattern.

Open Question

The results show that selecting for $\intinfo$ -robustness under mild stress creates patterns that are less robust to severe stress than unselected patterns. provides a partial answer: curriculum training with graduated, noisy stress exposure produces patterns that generalize to novel stressors ( $+1.2$ to $+2.7\text{pp}$ shift over naive across four novel severity levels). But the effect is modest—evolved patterns still decompose under severe novel stress ( $-1.7%$ ), just less than naive ( $-3.0%$ ). The remaining questions: (1) Can curriculum training with longer schedules or wider stress distributions close this gap further? (2) Does combining curriculum training with metabolic cost (’s lethal resource dependence) produce qualitatively different dynamics—active foraging rather than passive persistence? (3) Does the biological developmental sequence (graduated stressors from embryogenesis through maturation) achieve robust integration precisely because it is a curriculum over the full threat distribution? [ + curriculum combination not yet tested.]

What the Ladder Has Not Reached

Be explicit about how far these experiments are from anything resembling life, self-sustenance, or metacognition. The ladder metaphor risks implying a smooth gradient from Lenia gliders to organisms. The gap is enormous.

Self-sustenance. The patterns here are attractors of continuous dynamics, not self-maintaining entities. They do not consume resources to persist — resources modulate growth rates, but patterns do not “eat” in any metabolic sense. They do no thermodynamic work against entropy. They have no boundaries — density blobs, not membrane-enclosed. They persist as long as the physics allows, not because they maintain themselves. “Drought” reduces resource availability and weakens growth — closer to turning down the volume than to starving a dissipative structure.

Metacognition. The “self-model salience” metric measures how much a pattern’s own structure matters for its dynamics. That is not self-modeling — there is no representation of self, no information about the pattern stored within the pattern. The tiers (Sensory, Processing, Memory, Prediction) were labels imposed on the coupling structure. No functional specialization emerged: memory channels had weak activity, prediction channels predicted nothing.

Individual adaptation. All “learning” in these experiments occurs through population-level selection: cull the weak, boost the strong. No individual pattern adapts within its lifetime. Biological integration requires individual-level plasticity — the capacity for a single organism to reorganize its internal dynamics in response to experience.

These gaps converge on a single chasm. The transition from passive persistence to active self-maintenance — the autopoietic gap — requires at minimum: (a) lethal resource dependence (patterns that go to zero without active consumption), (b) metabolic work cycles (energy in $\to$ structure maintenance $\to$ waste out), and (c) self-reproduction (templated copying, not artificial cloning). Population-level selection on top of passive physics cannot bridge this; selection optimizes what exists rather than innovating the mechanism of existence itself.

Proposed Experiment

Question: Does lethal resource dependence change the integration response to stress? Design: Maintenance cost ( $\texttt{rate}{=}0.002$ ) drains each cell proportionally to mass each step. Fitness rewards metabolic efficiency. Result: 30-cycle evolution ( $C{=}64$ , A10G GPU, 215 min). Robustness $0.968 \to 0.973$ over evolution. Under severe drought: evolved $-2.6%$ , naive $+0.2%$ . Naive retained $98%$ of patterns; evolved retained $92%$ . The metabolic cost was insufficient to produce genuine lethality. Evolved patterns followed the same fragility pattern as : higher baseline fitness but more vulnerable to regime shift. Why it failed: The maintenance rate was too low to create existential pressure, but the deeper problem is structural. Even with lethal metabolic cost, a convolutional pattern has no mechanism for directed resource-seeking. Its “perception” extends only to kernel radius $R$ . Active foraging requires non-local information gathering—knowing where resources are before moving toward them. Adding metabolic cost to a blind substrate selects for efficiency (less waste), not for the kind of active self-maintenance that characterizes autopoiesis. Implication: The autopoietic gap is not primarily about resource dependence—it is about perceptual range. Closing it requires substrates where the interaction topology is state-dependent, not fixed by spatial proximity.

What the Data Actually Says

Eight experiments (), hundreds of GPU-hours, thousands of evolved patterns. The lessons follow.

Finding 1: The Yerkes-Dodson pattern is universal and robust. Across every substrate condition, channel count, and evolutionary regime, mild stress increases $\intinfo$ by $60$ – $200%$ . Not an artifact of any one measurement — a statistical truth: moderate perturbation prunes weak patterns, and the survivors are by definition the more integrated. Severe stress overwhelms even well-integrated patterns, producing the inverted-U. The clearest positive result in the entire experimental line.

Finding 2: Evolution consistently produces fragile integration. In every condition where evolution raises baseline $\intinfo$ (: $+10.5%$ , : higher metabolic fitness), evolved patterns decompose more under severe drought than unselected ones. Not a bug—a real dynamical phenomenon. Evolution here finds tightly-coupled configurations where all parts depend on all parts. Tight coupling is high integration by definition. It is also catastrophic fragility: when one component fails under depletion, the failure cascades through the whole structure. The difference between a tightly-coupled factory (high integration, catastrophic failure mode) and a loosely-coupled marketplace (low integration, graceful degradation under stress).

Finding 3: Curriculum training is the only intervention that improved generalization. is the sole condition where evolved patterns beat naive on novel stressors across the full severity range ( $+1.2$ to $+2.7$ percentage points). Not more channels, not hierarchical coupling, not metabolic cost—graduated, noisy stress exposure. The substrate barely matters next to the training regime. A direct parallel in developmental biology: organisms with rich developmental histories (graduated stressors from embryogenesis through maturation) develop robust integration; organisms exposed to a single threat level develop anxiety-like maladaptive responses. The CA experiments reproduce this with surprising fidelity.

Finding 4: The locality ceiling. This is the deepest lesson, visible only in retrospect across the full trajectory. Every experiment uses convolutional physics: each cell interacts only with neighbors within kernel radius $R$ , weighted by a static kernel. Information propagates at most $R$ cells per timestep. The interaction graph is determined by spatial proximity and does not change with the system’s state.

This means that $\intinfo$ can only arise from chains of local interactions—there is no mechanism for a perturbation at $(x, y)$ to directly affect $(x’, y’)$ unless $|x - x’| < R$ . The coupling matrix in partially addresses this (it couples distant channels), but it is fixed: the “who talks to whom” graph does not change in response to the system’s state. A pattern cannot choose to attend to a distant resource patch. It cannot reorganize its information flow under stress. It cannot forage.

makes this concrete. Adding metabolic cost to a substrate with radius- $R$ perception does not produce active self-maintenance. It produces efficient passivity—patterns that waste less, not patterns that seek more. A blind organism with a metabolic cost dies when local resources deplete, however well-integrated, because it cannot detect resources beyond its perceptual horizon. The autopoietic gap is not about resource dependence. It is about perceptual range and its state-dependent modulation—which is to say, attention.

Finding 5: Attention is necessary but not sufficient. tested the locality ceiling hypothesis directly by replacing convolution with windowed self-attention while keeping all other physics identical. The results create a clean ordering across three conditions:

Convolution (Condition C): Sustains $40$ – $80$ patterns, mean robustness $0.981$ . Life without integration.
Fixed-local attention (Condition A): Cannot sustain patterns at all— $30$ + consecutive extinctions across $3$ seeds. Attention expressivity without evolvable range is worse than convolution.
Evolvable attention (Condition B): Sustains $30$ – $75$ patterns, mean robustness $1.001$ . Life with integration at the threshold.

The $+2.0$ percentage point shift from C to B is the largest single-intervention effect in the entire line. But it is a shift to the threshold, not past it. Robustness stabilizes near $1.0$ rather than rising with further evolution. The system learns where to attend (entropy dropping from $6.22$ to $5.55$ ), but this saturates. What is missing is not better attention but individual-level adaptation—the capacity for a single pattern to reorganize its own dynamics within its lifetime, rather than waiting for population-level selection to find robust configurations post hoc. Biological integration under threat is not just a population statistic; it is a capacity of individual organisms.

Connection to the attention-as-leverage framework. The experiments meet the theory developed above. The effective distribution $p_{\text{eff}} = p_0 \cdot \alpha / \int p_0 \cdot \alpha$ establishes attention ( $\alpha$ ) as the high-leverage variable steering trajectories in chaotic dynamics. The Lenia experiments show what happens where $\alpha$ is fixed by architecture: the measurement distribution is the convolution kernel, which never changes. The system cannot modulate its own attention. No $\alpha$ to vary—and so it has lost its single highest-leverage control variable.

Biological systems solve this: neural attention (largely inhibitory gating) dynamically reshapes which signals propagate and which are suppressed. Under moderate stress attention narrows — the measurement distribution sharpens around threat-relevant features — and this reorganization preserves core integration while shedding peripheral processing. That is the biological pattern these experiments have hunted. It requires not just integration (which local physics can produce) but flexible integration (which requires state-dependent, non-local communication).

provides direct evidence. In the attention substrate the system’s $\alpha$ is the attention weights, and they evolve: attention entropy falls from $6.22$ to $5.55$ across 15 cycles as the system learns where to look. The measurement distribution becomes more structured—not by instruction, but through the same evolutionary pressure that failed in every convolutional substrate. The difference: the substrate now permits modulation of $\alpha$ . Enough to reach the integration threshold ( $\intinfo$ approximately preserved under stress) but not to clearly cross it ( $\intinfo$ does not reliably increase under stress the way it does in biological systems). Attention provides the mechanism; something else—individual-level plasticity, explicit memory, or autopoietic self-maintenance—provides the drive.

These results crystallize into a hypothesis — the attention bottleneck. The biological pattern (integration under threat) cannot emerge in substrates with fixed interaction topology, whatever the evolutionary regime. It requires a state-dependent interaction graph — where the system can modulate which signals propagate and which are suppressed in response to its current state. Convolutional physics lacks this; attention-like mechanisms provide it. The relevant variable is not substrate complexity ( $C$ ), not selection severity (metabolic cost), not training diversity (curriculum) — it is whether the system controls its own measurement distribution.

Status: Partially supported by , further advanced by . The first clause is confirmed: eight convolutional substrates () failed to produce integration under stress; fixed-local attention (Condition A) fared even worse. The second clause is partially confirmed: evolvable attention (Condition B) shifts robustness from $0.981$ to $1.001$ —the right direction, and the only intervention to cross the $1.0$ threshold. content-based coupling provides additional evidence: robustness peaks at $1.052$ under population bottleneck conditions (see Finding 6).

Finding 6: Content-based coupling enables intermittent biological-pattern integration. replaced 's learned attention projections with a simpler mechanism: cells modulate their interaction strength based on content similarity. The potential field becomes $\phi_i = \phi_{\text{FFT},i} \cdot (1 + \alpha \cdot S_i)$ where $S_i = \sigma(\beta \cdot (\bar{\text{sim}}_i - \tau))$ is a sigmoid gate on local mean cosine similarity. This is computationally cheaper than attention and provides a minimal test: does content-dependent topology, without learned query-key projections, suffice?

Three seeds, each $30$ cycles ( $C{=}16$ , $N{=}128$ ), curriculum stress schedule:

Mean robustness: $0.923$ across all seeds and cycles
Peak robustness: $1.052$ (seed 123, cycle 5, population $55$ patterns)
Phi increase fraction: $30\%$ of patterns show $\intinfo$ increase under stress
Key pattern: Robustness exceeds $1.0$ only when population drops below $\sim 50$ patterns — bottleneck events select for integration

Two distinct evolutionary strategies emerged across seeds. In one regime (large populations of $\sim 150$ – $180$ patterns), the similarity threshold $\tau$ drifted toward zero — evolution discovered that maximal content coupling (gate always-on) works when diversity is high. In another regime (volatile populations oscillating between $13$ and $120$ ), $\tau$ drifted upward to $0.86$ — selective coupling, where only highly similar cells interact. The selective-coupling regime produced all the robustness-above- $1.0$ episodes.

The deeper lesson is not about content coupling per se. It is about composition under selection pressure. When stress culls a population to a handful of survivors, those survivors are not merely the individually strongest — they are the ones whose content-coupling topology supports coherent reorganization under perturbation. What we are watching may be closer to symbiogenesis — functional subunits composing into more complex wholes — than to classical Darwinian selection optimizing a fixed design. Content-coupling makes patterns legible to each other, enabling the functional encounter that drives compositional complexity. Intelligence may need not deep evolutionary history so much as the right conditions for compositional encounter: embodied computation, lethal stakes, mutual legibility.

Proposed Experiment

Question: Does state-dependent interaction topology enable the biological integration pattern that local physics cannot produce? Design: Replace the convolution kernel with windowed self-attention: each cell updates its state by attending to cells within a local window, with attention weights computed from cell states (query-key mechanism). The window size is evolvable—evolution can expand or contract the perceptual range. Resources, drought, and selection pressure follow the protocol. Critical prediction: Under survival pressure, evolution should expand the attention window (increasing perceptual range), and patterns should show the biological pattern— $\intinfo$ increasing under moderate stress—because they can dynamically reallocate information flow to maintain core integration. The attention patterns themselves should narrow under stress (focused measurement) and broaden during safety (diffuse exploration). Control for the free-lunch problem: Start with strictly local attention (window $= R$ , matching Lenia's kernel radius). If integration under threat emerges only after evolution expands the window, the biological pattern is an adaptive achievement, not an architectural gift. Status: Implemented as . Three conditions:

A (Fixed-local attention): Window size fixed at kernel radius $R$ . Free-lunch control.
B (Evolvable attention): Window size $w \in [R, 16]$ is evolvable. The main hypothesis test.
C (FFT convolution): physics as known baseline.

Implementation: Windowed self-attention replaces Step 1 (FFT convolution) of the Lenia scan body. Query-key projections ( $W_q, W_k \in \mathbb{R}^{d \times C}$ ) are shared across space, evolved slowly. Soft distance mask via $\sigma(\beta(w_{\text{soft}}^2 - r^2))$ enables smooth window expansion. Temperature $\tau$ governs attention sharpness. All other physics (growth function, coupling gate, resource dynamics, decay, maintenance) remain identical to . Curriculum training protocol from . $C{=}16$ , $N{=}128$ , 30 cycles, 3 seeds per condition, A10G GPUs. [6pt] Results (15 cycles for B, 3 seeds; A and C complete):

Condition C (convolution, 30 cycles, 3 seeds): Mean robustness $0.981$ . Only $3/90$ cycles ( $3%$ ) show $\intinfo$ increasing under stress. Novel stress test: evolved $\Delta = -0.6% \pm 1.6%$ , naive $\Delta = -0.2% \pm 3.2%$ . Evolution helps (evolved consistently better than naive) but cannot break the locality ceiling.
Condition B (evolvable attention, 15 cycles, 3 seeds): Mean robustness $1.001$ across 38 valid cycles. $16/38$ cycles ( $42%$ ) show $\intinfo$ increasing under stress (vs $3%$ for convolution). The $+2.0$ percentage point shift over convolution is the largest in the + line. However, robustness does not trend upward with further evolution—it stabilizes near $1.0$ , suggesting the system reaches a ceiling of its own.
Condition A (fixed-local attention): Conclusive negative. $30$ + consecutive extinctions across all 3 seeds—patterns cannot survive even a single cycle. Fixed-local attention is worse than convolution, which sustains $40$ – $80$ patterns easily. This establishes a clean ordering: convolution sustains life without integration; fixed attention cannot sustain life at all; evolvable attention sustains life with integration. Adaptability of interaction topology matters more than its expressiveness.

Three lessons: (1) Attention window does not expand as predicted—evolution refines how attention is allocated (entropy decreasing from $6.22 \to 5.55$ ) rather than extending range. This resembles biological inhibitory gating (selective, not panoramic) more than the original prediction anticipated. (2) Attention temperature $\tau$ increases in successful seeds ( $1.0 \to 1.3$ – $1.7$ ), suggesting evolution favors broad, soft attention with learned structure over sharp, narrow focus. (3) The effect is real but modest: attention moves the system to the integration threshold without clearly crossing it. State-dependent interaction topology is necessary for integration under stress, but not sufficient for the full biological pattern of $\intinfo$ increasing under threat. What remains missing is likely individual-level adaptation—the capacity for a single pattern to reorganize its own dynamics within its lifetime, rather than relying on population-level selection to discover robust configurations.

The MARL ablation produced a surprise: all seven conditions show highly significant geometric alignment ( $\rho > 0.21$ , $p < 0.0001$ ), and removing forcing functions does not reduce alignment—if anything, it slightly increases it. The predicted hierarchy was wrong: geometric alignment is a baseline property of multi-agent survival, not contingent on any specific forcing function. This strengthens the universality claim but challenges the forcing function theory in the next section.