Part V: Gods

Reframing Alignment

Introduction
0:00 / 0:00

Reframing Alignment

Standard alignment: "Make AI do what humans want."

Reframed: "What agentic systems are we instantiating, at what scale, with what viability manifolds?"

Genuine alignment must therefore address multiple scales simultaneously:

  1. Individual AI scale: System does what operators intend
  2. AI ecosystem scale: Multiple AI systems interact without pathological emergent dynamics
  3. AI-human hybrid scale: AI + human systems don't form parasitic patterns
  4. Superorganism scale: Emergent agentic patterns from AI + humans + institutions have aligned viability

A superorganism—including AI-substrate superorganisms—is well-designed if:

  1. Aligned viability: VGhVh\viable_G \subseteq \bigcap_h \viable_h
  2. Error correction: Updates beliefs on evidence
  3. Bounded growth: Does not metastasize beyond appropriate scale
  4. Graceful death: Can dissolve when no longer beneficial
Deep Technical: Multi-Agent Affect Measurement

When multiple AI agents interact, emergent collective affect patterns may arise. This sidebar provides protocols for measuring affect at the multi-agent and superorganism scales.

Setup. Consider NN agents A1,,AN{A_1, …, A_N} interacting over time. Each agent ii has internal state ziz_i and produces actions aia_i. The environment EE mediates interactions.

Individual agent affect. For each agent, compute the affect vector:

ai=(Vali,Ari,Φi,reff[i],CFi,SMi)\mathbf{a}_i = (\Val_i, \Ar_i, \intinfo_i, \effrank[i], \cfweight_i, \selfsal_i)

using the protocols from earlier sidebars.

Collective affect. Aggregate measures for the agent population:

Mean field affect: Simple average across agents.

aˉ=1Ni=1Nai\bar{\mathbf{a}} = \frac{1}{N} \sum_{i=1}^N \mathbf{a}_i

Affect dispersion: Variance within the population.

σd2=1Ni=1Naiaˉ2\sigma^2_d = \frac{1}{N} \sum_{i=1}^N |\mathbf{a}_i - \bar{\mathbf{a}}|^2

High dispersion = fragmented collective. Low dispersion = synchronized collective.

Affect contagion rate: How quickly affect spreads between agents.

κ=ddtcorr(ai,aj)t\kappa = \frac{d}{dt} \text{corr}(\mathbf{a}_i, \mathbf{a}_j) \Big|_{t \to \infty}

Positive κ\kappa = affect synchronization. Negative κ\kappa = affect dampening.

Superorganism-level integration. Does the multi-agent system have integration exceeding its parts?

ΦG=I(z1,,zN;ot+1:t+H)i=1NI(zi;ot+1:t+Hi)\intinfo_G = \MI(z_1, …, z_N; \mathbf{o}_{t+1:t+H}) - \sum_{i=1}^N \MI(z_i; \mathbf{o}^i_{t+1:t+H})

where o\mathbf{o} are collective observations and oi\mathbf{o}^i are agent-specific. Positive ΦG\intinfo_G indicates emergent integration—the collective predicts more than the sum of individuals.

Superorganism valence. Is the collective moving toward or away from viability?

ValG=ddtE[τcollective]\Val_G = \frac{d}{dt} \E[\tau_{\text{collective}}]

where τcollective\tau_{\text{collective}} is expected time until collective dissolution (e.g., coordination failure, resource exhaustion).

Human substrate affect tracking. For human-AI hybrid superorganisms, include human affect:

Survey methods: Self-reported affect from human participants at regular intervals.

Physiological methods: EEG coherence, heart rate variability correlation, galvanic skin response synchronization across human members.

Behavioral methods: Communication sentiment, coordination efficiency, conflict frequency.

Alignment diagnostic. A superorganism is parasitic if:

ValG>0ANDValˉhuman<0\Val_G > 0 \quad \text{AND} \quad \bar{\Val}_{\text{human}} < 0

The collective thrives while humans suffer. This is the demon signature.

Mutualistic if:

ValG>0ANDValˉhuman>0\Val_G > 0 \quad \text{AND} \quad \bar{\Val}_{\text{human}} > 0

Collective and humans thrive together.

Real-time monitoring protocol.

  1. Instrument each agent to emit affect state at frequency ff (e.g., 1 Hz)
  2. Central aggregator computes collective measures
  3. Track ΦG\intinfo_G, ValG\Val_G, and alignment diagnostics over time
  4. Alert when: ΦG\intinfo_G exceeds threshold (emergent superorganism forming); ValG\Val_G and Valˉhuman\bar{\Val}_{\text{human}} diverge (parasitic dynamics); affect contagion accelerates (potential pathological synchronization)

Intervention points. When parasitic dynamics detected:

  • Communication throttling: Reduce agent interaction frequency
  • Diversity injection: Introduce agents with different optimization targets
  • Human-in-loop checkpoints: Require human approval for collective decisions
  • Pattern dissolution: If ValG0\Val_G \gg 0 and Valˉhuman0\bar{\Val}_{\text{human}} \ll 0, consider shutdown

Open question: Can we design superorganisms that are constitutively aligned—where their viability requires human flourishing rather than merely being compatible with it?