Part I: Foundations

Self-Modeling as Prediction Error Minimization

Introduction

0:00 / 0:00

Self-Modeling as Prediction Error Minimization

When $\rho_t$ is large, the agent’s own policy is a major latent cause of its observations. Consider the world model’s prediction task:

p(\mathbf{o}_{t+1} | \mathbf{h}_t) = \sum_{\mathbf{x}, \mathbf{a}} p(\mathbf{o}_{t+1} | \mathbf{x}_{t+1}) p(\mathbf{x}_{t+1} | \mathbf{x}_t, \mathbf{a}_t) p(\mathbf{x}_t | \mathbf{h}_t) p(\mathbf{a}_t | \mathbf{h}_t)

The term $p(\mathbf{a}_t | \mathbf{h}_t)$ is the agent’s own policy. If the world model treats actions as exogenous—as if they come from outside the system—then it cannot accurately model this term. This generates systematic prediction error.

This generates a pressure toward self-modeling. Let $\worldmodel$ be a world model for an agent with self-effect ratio $\rho > \rho_c$ for some threshold $\rho_c > 0$ . Then:

\mathcal{L}_{\text{pred}}[\worldmodel \text{ with self-model}] < \mathcal{L}_{\text{pred}}[\worldmodel \text{ without self-model}]

where $\mathcal{L}_{\text{pred}}$ is the prediction loss. The gap grows with $\rho$ .

Proof.

Without a self-model, the world model must treat $p(\mathbf{a}_t | \mathbf{h}_t)$ as a fixed prior or uniform distribution. But the true action distribution depends on the agent’s internal states—beliefs, goals, and computational processes. By including a model of these internal states (a self-model $\selfmodel$ ), the world model can better predict $\mathbf{a}_t$ and hence $\mathbf{o}_{t+1}$ . The improvement is proportional to the mutual information $\MI(\selfmodel_t; \mathbf{a}_t)$ , which scales with $\rho$ .

□

What does such a self-model contain? A self-model $\selfmodel$ is a component of the world model that represents:

The agent’s internal states (beliefs, goals, attention, etc.)
The agent’s policy as a function of these internal states
The agent’s computational limitations and biases
The causal influence of these factors on action and observation

Formally, $\selfmodel_t = f_\psi(\latent^{\text{internal}}_t)$ where $\latent^{\text{internal}}_t$ captures the relevant internal degrees of freedom.

Self-modeling becomes the cheapest way to improve control once the agent's actions dominate its observations. The "self" is not mystical; it is the minimal latent variable that makes the agent's own behavior predictable.

A consequence: the self-model has interiority. It does not merely describe the agent’s body from outside; it captures the intrinsic perspective—goals, beliefs, anticipations, the agent’s own experience of what it is to be an agent. Once this self-model exists, the cheapest strategy for modeling other entities whose behavior resembles the agent’s is to reuse the same architecture. The self-model becomes the template for modeling the world. This has a name in Part II—participatory perception—and a parameter that governs how much of the self-model template leaks into the world model. That parameter, the inhibition coefficient $\iota$ , will turn out to shape much of what follows.