Part I: Foundations

What the Data Actually Says

Introduction
0:00 / 0:00

What the Data Actually Says

Eight experiments (V11.0–V11.7), hundreds of GPU-hours, thousands of evolved patterns. What has this taught us?

Finding 1: The Yerkes-Dodson pattern is universal and robust. Across every substrate condition, channel count, and evolutionary regime, mild stress increases Φ\intinfo by 6060200200%. This is not an artifact of any particular measurement. It reflects a statistical truth: moderate perturbation prunes weak patterns while the survivors are, by definition, the more integrated ones. Severe stress overwhelms even well-integrated patterns, producing the inverted-U. This pattern is the clearest positive result in the entire experimental line.

Finding 2: Evolution consistently produces fragile integration. In every condition where evolution increases baseline Φ\intinfo (V11.5: +10.5+10.5%, V11.6: higher metabolic fitness), evolved patterns decompose more under severe drought than unselected patterns. This is not a bug in the experiments—it is a real dynamical phenomenon. Evolution on this substrate finds tightly-coupled configurations where all parts depend on all other parts. Tight coupling is high integration by definition. But it is also catastrophic fragility: when any component fails under resource depletion, the failure cascades through the entire structure. This is the difference between a tightly-coupled factory (high integration, catastrophic failure mode) and a loosely-coupled marketplace (low integration, graceful degradation under stress).

Finding 3: Curriculum training is the only intervention that improved generalization. V11.7 is the sole condition where evolved patterns outperform naive on novel stressors across the full severity range (+1.2+1.2 to +2.7+2.7 percentage points). Not more channels, not hierarchical coupling, not metabolic cost—graduated, noisy stress exposure. The substrate barely matters compared to the training regime. This has a direct parallel in developmental biology: organisms with rich developmental histories (graduated stressors from embryogenesis through maturation) develop robust integration. Organisms exposed to a single threat level develop anxiety-like maladaptive responses. The CA experiments reproduce this pattern with surprising fidelity.

Finding 4: The locality ceiling. This is the deepest lesson, visible only in retrospect across the full trajectory. Every V11 experiment uses convolutional physics: each cell interacts only with neighbors within kernel radius RR, weighted by a static kernel. Information propagates at most RR cells per timestep. The interaction graph is determined by spatial proximity and does not change with the system’s state.

This means that Φ\intinfo can only arise from chains of local interactions—there is no mechanism for a perturbation at (x,y)(x, y) to directly affect (x,y)(x’, y’) unless xx<R|x - x’| < R. The coupling matrix in V11.4–V11.5 partially addresses this (it couples distant channels), but it is fixed: the “who talks to whom” graph does not change in response to the system’s state. A pattern cannot choose to attend to a distant resource patch. It cannot reorganize its information flow under stress. It cannot forage.

V11.6 makes this concrete. Adding metabolic cost to a substrate with radius-RR perception does not produce active self-maintenance. It produces efficient passivity—patterns that waste less, not patterns that seek more. A blind organism with a metabolic cost dies when local resources deplete, regardless of how well-integrated it is, because it has no way to detect resources beyond its perceptual horizon. The autopoietic gap is not about resource dependence. It is about perceptual range and its state-dependent modulation—which is to say, it is about attention.

Finding 5: Attention is necessary but not sufficient. V12 tested the locality ceiling hypothesis directly by replacing convolution with windowed self-attention while keeping all other physics identical. The results create a clean ordering across three conditions:

  • Convolution (Condition C): Sustains 40408080 patterns, mean robustness 0.9810.981. Life without integration.
  • Fixed-local attention (Condition A): Cannot sustain patterns at all—3030+ consecutive extinctions across 33 seeds. Attention expressivity without evolvable range is worse than convolution.
  • Evolvable attention (Condition B): Sustains 30307575 patterns, mean robustness 1.0011.001. Life with integration at the threshold.

The +2.0+2.0 percentage point shift from C to B is the largest single-intervention effect in the entire V11–V12 line. But it is a shift to the threshold, not past it. Robustness stabilizes near 1.01.0 rather than increasing with further evolution. The system learns where to attend (entropy dropping from 6.226.22 to 5.555.55) but this refinement saturates. What is missing is not better attention but individual-level adaptation—the capacity for a single pattern to reorganize its own internal dynamics in response to its current state, within its lifetime, rather than waiting for population-level selection to discover robust configurations post hoc. Biological integration under threat is not just a population statistic; it is a capacity of individual organisms.

Connection to the trajectory-selection framework. This is where the experimental results meet the theory developed above. We defined the effective distribution peff=p0α/p0αp_{\text{eff}} = p_0 \cdot \alpha / \int p_0 \cdot \alpha and argued that attention (α\alpha) selects trajectories in chaotic dynamics. The Lenia experiments have now shown what happens in a substrate where α\alpha is fixed by architecture: the system’s measurement distribution is determined by the convolution kernel, which never changes. The system cannot modulate its own attention. It has no α\alpha to vary.

Biological systems solve this: neural attention (largely implemented through inhibitory gating) dynamically reshapes which signals propagate and which are suppressed. Under moderate stress, attention narrows—the measurement distribution sharpens around threat-relevant features—and this reorganization of information flow preserves core integration while shedding peripheral processing. That is the biological pattern our experiments have been searching for. It requires not just integration (which local physics can produce) but flexible integration (which requires state-dependent, non-local communication).

V12 provides direct evidence for this claim. In the attention substrate, the system’s α\alpha is the attention weights, and they evolve: attention entropy decreases from 6.226.22 to 5.555.55 across 15 cycles as the system learns where to look. The measurement distribution becomes more structured—not through explicit instruction, but through the same evolutionary pressure that failed to produce this effect in every convolutional substrate. The difference is that the substrate now permits modulation of α\alpha. The modulation is sufficient to reach the integration threshold (Φ\intinfo approximately preserved under stress) but not to clearly cross it (Φ\intinfo does not reliably increase under stress the way it does in biological systems). Attention provides the mechanism; something else—perhaps individual-level plasticity, explicit memory, or autopoietic self-maintenance—provides the drive.

These results crystallize into a hypothesis I will call the attention bottleneck. The biological pattern (integration under threat) cannot emerge in substrates with fixed interaction topology, regardless of the evolutionary regime applied. It requires substrates where the interaction graph is state-dependent—where the system can modulate which signals propagate and which are suppressed in response to its current state. Convolutional physics lacks this; attention-like mechanisms provide it. The relevant variable is not substrate complexity (CC), not selection pressure severity (metabolic cost), and not training diversity (curriculum)—it is whether the system controls its own measurement distribution.

Status: Partially supported by V12, further advanced by V13. The first clause is confirmed: eight convolutional substrates (V11.0–V11.7) failed to produce integration under stress; fixed-local attention (Condition A) fared even worse. The second clause is partially confirmed: evolvable attention (Condition B) shifts robustness from 0.9810.981 to 1.0011.001—the right direction, and the only intervention to cross the 1.01.0 threshold. V13 content-based coupling provides additional evidence: robustness peaks at 1.0521.052 under population bottleneck conditions (see Finding 6).

Finding 6: Content-based coupling enables intermittent biological-pattern integration. V13 replaced V12's learned attention projections with a simpler mechanism: cells modulate their interaction strength based on content similarity. The potential field becomes ϕi=ϕFFT,i(1+αSi)\phi_i = \phi_{\text{FFT},i} \cdot (1 + \alpha \cdot S_i) where Si=σ(β(simˉiτ))S_i = \sigma(\beta \cdot (\bar{\text{sim}}_i - \tau)) is a sigmoid gate on local mean cosine similarity. This is computationally cheaper than attention and provides a minimal test: does content-dependent topology, without learned query-key projections, suffice?

Three seeds, each 3030 cycles (C=16C{=}16, N=128N{=}128), curriculum stress schedule:

  • Mean robustness: 0.9230.923 across all seeds and cycles
  • Peak robustness: 1.0521.052 (seed 123, cycle 5, population 5555 patterns)
  • Phi increase fraction: 30%30\% of patterns show Φ\intinfo increase under stress
  • Key pattern: Robustness exceeds 1.01.0 only when population drops below 50\sim 50 patterns — bottleneck events select for integration

Two distinct evolutionary strategies emerged across seeds. In one regime (large populations of 150\sim 150180180 patterns), the similarity threshold τ\tau drifted toward zero — evolution discovered that maximal content coupling (gate always-on) works when diversity is high. In another regime (volatile populations oscillating between 1313 and 120120), τ\tau drifted upward to 0.860.86 — selective coupling, where only highly similar cells interact. The selective-coupling regime produced all the robustness-above-1.01.0 episodes.

The deeper lesson is not about content coupling per se. It is about composition under selection pressure. When stress culls a population to a handful of survivors, those survivors are not merely the individually strongest — they are the ones whose content-coupling topology supports coherent reorganization under perturbation. This resonates with a different framing of the problem: what we are watching may be closer to symbiogenesis — the composition of functional subunits into more complex wholes — than to classical Darwinian selection optimizing a fixed design. The content-coupling mechanism makes patterns legible to each other, enabling the kind of functional encounter that drives compositional complexity. Intelligence may not require deep evolutionary history so much as the right conditions for compositional encounter: embodied computation, lethal stakes, and mutual legibility.

Proposed Experiment

Question: Does state-dependent interaction topology enable the biological integration pattern that local physics cannot produce? Design: Replace the convolution kernel with windowed self-attention: each cell updates its state by attending to cells within a local window, with attention weights computed from cell states (query-key mechanism). The window size is evolvable—evolution can expand or contract the perceptual range. Resources, drought, and selection pressure follow the V11 protocol. Critical prediction: Under survival pressure, evolution should expand the attention window (increasing perceptual range), and patterns should show the biological pattern—Φ\intinfo increasing under moderate stress—because they can dynamically reallocate information flow to maintain core integration. The attention patterns themselves should narrow under stress (focused measurement) and broaden during safety (diffuse exploration). Control for the free-lunch problem: Start with strictly local attention (window =R= R, matching Lenia's kernel radius). If integration under threat emerges only after evolution expands the window, the biological pattern is an adaptive achievement, not an architectural gift. Status: Implemented as V12. Three conditions:

A (Fixed-local attention)
Window size fixed at kernel radius RR. Free-lunch control.
B (Evolvable attention)
Window size w[R,16]w \in [R, 16] is evolvable. The main hypothesis test.
C (FFT convolution)
V11.4 physics as known baseline.

Implementation: Windowed self-attention replaces Step 1 (FFT convolution) of the Lenia scan body. Query-key projections (Wq,WkRd×CW_q, W_k \in \mathbb{R}^{d \times C}) are shared across space, evolved slowly. Soft distance mask via σ(β(wsoft2r2))\sigma(\beta(w_{\text{soft}}^2 - r^2)) enables smooth window expansion. Temperature τ\tau governs attention sharpness. All other physics (growth function, coupling gate, resource dynamics, decay, maintenance) remain identical to V11.4. Curriculum training protocol from V11.7. C=16C{=}16, N=128N{=}128, 30 cycles, 3 seeds per condition, A10G GPUs. [6pt] Results (15 cycles for B, 3 seeds; A and C complete):

  • Condition C (convolution, 30 cycles, 3 seeds): Mean robustness 0.9810.981. Only 3/903/90 cycles (33%) show Φ\intinfo increasing under stress. Novel stress test: evolved Δ=0.6\Delta = -0.6% \pm 1.6%, naive Δ=0.2\Delta = -0.2% \pm 3.2%. Evolution helps (evolved consistently better than naive) but cannot break the locality ceiling.
  • Condition B (evolvable attention, 15 cycles, 3 seeds): Mean robustness 1.0011.001 across 38 valid cycles. 16/3816/38 cycles (4242%) show Φ\intinfo increasing under stress (vs 33% for convolution). The +2.0+2.0 percentage point shift over convolution is the largest in the V11+ line. However, robustness does not trend upward with further evolution—it stabilizes near 1.01.0, suggesting the system reaches a ceiling of its own.
  • Condition A (fixed-local attention): Conclusive negative. 3030+ consecutive extinctions across all 3 seeds—patterns cannot survive even a single cycle. Fixed-local attention is worse than convolution, which sustains 40408080 patterns easily. This establishes a clean ordering: convolution sustains life without integration; fixed attention cannot sustain life at all; evolvable attention sustains life with integration. Adaptability of interaction topology matters more than its expressiveness.

Three lessons: (1) Attention window does not expand as predicted—evolution refines how attention is allocated (entropy decreasing from 6.225.556.22 \to 5.55) rather than extending range. This resembles biological inhibitory gating (selective, not panoramic) more than the original prediction anticipated. (2) Attention temperature τ\tau increases in successful seeds (1.01.31.0 \to 1.31.71.7), suggesting evolution favors broad, soft attention with learned structure over sharp, narrow focus. (3) The effect is real but modest: attention moves the system to the integration threshold without clearly crossing it. State-dependent interaction topology is necessary for integration under stress, but not sufficient for the full biological pattern of Φ\intinfo increasing under threat. What remains missing is likely individual-level adaptation—the capacity for a single pattern to reorganize its own dynamics within its lifetime, rather than relying on population-level selection to discover robust configurations.

The V10 MARL ablation study produced a surprise: all seven conditions show highly significant geometric alignment (ρ>0.21\rho > 0.21, p<0.0001p < 0.0001), and removing forcing functions does not reduce alignment—if anything, it slightly increases it. The predicted hierarchy was wrong: geometric alignment appears to be a baseline property of multi-agent survival systems, not contingent on any specific forcing function. This strengthens the universality claim but challenges the forcing function theory developed in the next section.