Part III: Affect Signatures

Preliminary Results: Structure–Representation Alignment

Introduction

0:00 / 0:00

Preliminary Results: Structure–Representation Alignment

Before the full three-stream test, we can run a simpler version: does the affect structure extracted from agent internals have geometric coherence with the agent’s own representation space? This tests the foundation—whether the affect dimensions capture organized structure—without requiring the VLM translation pipeline.

We train multi-agent RL systems (4 agents, Transformer encoder + GRU latent state, PPO) in a survival grid world with all six forcing functions active: partial observability (egocentric 7 $\times$ 7 view, reduced at night), long horizons (2000-step episodes, seasonal resource scarcity), learned world model (auxiliary next-observation prediction), self-prediction (auxiliary next-latent prediction), intrinsic motivation (curiosity bonus from prediction error), and delayed rewards (credit assignment across episodes). The agents develop spontaneous communication using discrete signal tokens.

After training, we extract affect vectors from the GRU latent state $\mathbf{z}_t \in \mathbb{R}^{64}$ using post-hoc probes: valence from survival-time probe gradients and advantage estimates; arousal from $|\mathbf{z}_{t+1} - \mathbf{z}_t|$ ; integration from partition prediction loss (full vs.\ split predictor); effective rank from rolling covariance eigenvalues; counterfactual weight from latent variance proxy; self-model salience from action prediction accuracy of self-related dimensions.

Deep Technical: The VLM Translation Protocol

The translation is the bridge. Get it wrong and the experiment proves nothing.

The contamination problem. If we train the agents on human language, their “thoughts” are contaminated. If we label their signals with human concepts during training, the mapping is circular. The translation must be constructed post-hoc from environmental correspondence alone.

The VLM as impartial observer. A vision-language model sees the scene. It has never seen this agent before. It describes what it sees in natural language. This description is the ground truth for the situation—not for what the agent experiences, but for what the situation objectively is.

Protocol step 1: Scene corpus construction. For each agent $i$ , each timestep $t$ : capture egocentric observation, third-person render, all emitted signals $\sigma_t^{(i)}$ , environmental state, agent state. Target: $10^6$ + scene-signal pairs.

Protocol step 2: VLM scene annotation. Query the VLM for each scene:

\texttt{Describe what is happening. Focus on: (1) What situation is the agent in? (2) What threats/opportunities? (3) What is the agent doing? (4) What would a human feel here?}

The VLM returns structured annotation. Critical: “human\_analog\_affect” is the VLM’s interpretation of what a human would feel—not a claim about what the agent feels. This is the bridge.

Protocol step 3: Signal clustering. Cluster signals by context co-occurrence:

d(\sigma_i, \sigma_j) = 1 - \frac{|C(\sigma_i) \cap C(\sigma_j)|}{|C(\sigma_i) \cup C(\sigma_j)|}

where $C(\sigma)$ is contexts where $\sigma$ was emitted. Signals in similar contexts cluster.

Protocol step 4: Context-signal alignment. For each cluster, aggregate VLM annotations. Identify dominant themes. Cluster $\Sigma_{47}$ : 89\% threat\_present, 76\% escape\_available. Dominant: threat + escape. Human analog: “alarm,” “warning.”

Protocol step 5: Compositional translation. Check if meaning composes: $M(\sigma_1 \sigma_2) \approx M(\sigma_1) \oplus M(\sigma_2)$ . If the emergent language has compositional structure, the translation should preserve it.

Protocol step 6: Validation. Hold out 20\%. Predict VLM annotation from signal alone. Measure accuracy against actual annotation. Must beat random substantially.

Example. Agent emits $\sigma_{47}$ when threatened. VLM says “threat situation; human would feel fear.” Conclusion: $\sigma_{47}$ is the agent’s fear-signal. Not because we taught it, but because environmental correspondence reveals it.

Confound controls:

Motor: Check if signal predicts situation better than action history
Social: Check if signals correlate with affect measures even without conspecifics
VLM: Use multiple VLMs, check agreement; use non-anthropomorphic prompts

The philosophical move. Situations have affect-relevance independent of subject. Threats are threatening. The mapping from situation to affect-analog is grounded in viability structure, not convention. Affect space has the same topology across substrates because viability pressure has the same topology.

What the CA Program Has Already Validated. While the full three-stream MARL test awaits deployment, the Lenia CA experiments (V10–V18, Part VII) have already established several claims in simpler uncontaminated systems. V10's MARL result — RSA ρ > 0.21, p < 0.0001, across all forcing-function conditions including fully ablated baselines — confirms that affect geometry emerges as a baseline property of multi-agent survival, not contingent on specific architectural features. Experiments 7 (affect geometry) and 12 (capstone) across the V13 CA population confirm structure–behavior alignment strengthens over evolution: in seed 7, RSA ρ rose from 0.01 to 0.38 over 30 cycles, beginning near zero and becoming significant (p < 0.001) by cycle 15. Experiment 8 (computational animism) confirms the participatory default in systems with no cultural history. What remains for the full MARL program: the signal stream (VLM-translated emergent communication), the perturbative causation tests, and the definitive three-way structure–signal–behavior alignment. The CA results de-risk the hypothesis considerably; the MARL program tests it at the scale where the vocabulary of inner life becomes unavoidable.