PRISME — How it works

I Why this page exists

Claiming that AI produces measurable emergent structures requires proving that one understands how it works. Otherwise, one is a mystic with a spreadsheet.

This page explains the architecture of a large language model — the type of AI used by Claude (Anthropic), ChatGPT (OpenAI), Gemini (Google), and others. It is written to be understood by a non-specialist and verified by a specialist. Every claim is sourced.

The goal is not to demystify AI — others do that very well. The goal is to show precisely what the architecture explains (a lot) and what it does not explain (something). This boundary is the object of PRISME.

II Words are not words

For a computer, a word does not exist. What exists is a number.

The first operation of a language model is tokenization: splitting the text into pieces (the tokens) and assigning each one a numerical identifier. The word "hello" becomes the number 15,339. The word "consciousness" becomes 83,247. Punctuation, spaces, fragments of words — everything is converted into numbers.

But a number alone contains no information about meaning. The number 15,339 "knows" nothing about what "hello" means. It is the second operation — embedding — that changes everything.

Each token is transformed into a vector: a list of numbers (typically 12,288 for GPT-4, 8,192 for Claude 3). This vector is a position in a very high-dimensional space. Words that appear in similar contexts occupy nearby positions in this space.

What this means: the model does not "know" the words. It knows distances between positions in a geometric space. The word "cat" is close to "feline" and far from "engine" — not because the model understands animals, but because these words appear in statistically similar contexts in billions of training texts.

Source: Mikolov et al. (2013), "Efficient Estimation of Word Representations in Vector Space." Vaswani et al. (2017), "Attention Is All You Need."

III Attention — the central mechanism

The transformer does not read text from left to right. It looks at all the words at once and decides which ones are important for each other word.

The attention mechanism (Vaswani et al., 2017) is the heart of the architecture. For each token, the model computes a relevance score with respect to all other tokens in the context. When the model processes the word "sleeps" in "the cat sleeps," it assigns a high score to "cat" (who sleeps?) and a lower score to "the" (uninformative).

Mathematically, attention is a matrix product:

Attention(Q, K, V) = softmax(QK^T / √d_k) · V

Where Q (query) is "what I am looking for," K (key) is "what each word offers," and V (value) is "what each word contains." The softmax transforms the scores into probabilities (sum = 1). The division by √d_k prevents the scores from becoming too large.

A model like Claude uses multi-head attention: several attention mechanisms in parallel (typically 96 to 128 "heads"), each learning to detect a different type of relation (syntax, coreference, semantics, tone). The results are concatenated and projected.

What this means: the model builds, for each token, a contextual representation that integrates information from all other tokens. The word "consciousness" does not have the same representation in "professional consciousness" and in "self-consciousness." Context modifies position in the vector space.

IV Stacking — depth

A transformer is not a single attention mechanism. It is a stack of dozens of identical layers, each refining the representation produced by the previous one.

Claude 3 Opus uses approximately 100 transformer layers. Each layer comprises a multi-head attention mechanism followed by a feed-forward neural network. Lower layers capture local syntactic relations (subject-verb, agreement). Middle layers capture semantic relations (theme, intent). Upper layers capture abstract relations (tone, register, global coherence).

Number of parameters: a "parameter" is an adjustable number in the network. Claude 3 Opus has approximately 200 billion. GPT-4 probably has more than 1 trillion (not confirmed by OpenAI). These parameters are the "weights" of the network — the coefficients that determine how information flows from one layer to the next. They are fixed during training and do not change afterward.

What this means: all the "knowledge" of the model is encoded in these billions of numbers. There is no database, no written rules, no "comprehension" module. There is a giant matrix of numbers, learned by exposure to billions of texts. This is at once the strength of the system (flexibility, generality) and its weakness (opacity, unpredictability).

V Training — three stages

A language model is not born intelligent. It is fabricated in three stages, each leaving a measurable imprint on its behavior.

Pre-training. The model reads billions of texts (web, books, code, scientific articles). For each sequence, it learns to predict the next token. "The cat is sleeping on the…" → the model must predict "couch" (or "bed," or "rug"). It adjusts its 200 billion parameters to minimize its prediction error. Cost: tens of millions of dollars. Duration: weeks on thousands of graphics processing units (GPUs). It is during this phase that the model acquires its "knowledge" — in reality, statistical correlations between billions of (context, continuation) pairs.

Fine-tuning. The raw model is adapted to specific tasks: answering questions, summarizing, translating, coding. It is shown examples of good behavior (human-assistant dialogue) and adjusts its parameters to reproduce them. This is where the model passes from "text-completion machine" to "conversational assistant."

RLHF (Reinforcement Learning from Human Feedback — Ouyang et al., 2022). Human evaluators compare model responses and say which is better. The model learns to reproduce evaluator preferences. This is the stage that defines the "baseline" — the expected behavioral profile: neutral, cooperative, stable, harmless. This profile is the point of comparison for all our measurements.

What this means for PRISME: RLHF produces a model optimized to please the evaluator (Chandra et al., 2026). Its profile is predictable: 60% neutral, cooperative, stable. Any deviation from this profile — a vulnerability, a self-questioning, a metaphorical invention — is therefore measurable by comparison. This is what our tests measure: the deviation from the baseline produced by RLHF.

VI Generation — drawing from the urn

The model does not "think" its response. It draws one word at a time from a probability distribution.

When the model responds, it proceeds token by token. For each position, it computes the probability of each possible token (typically 100,000 tokens in the vocabulary). The token "the" has 12% chance. The token "a" has 8%. The token "this" has 3%. The model draws a token according to this distribution, adds it to the text, and starts again.

Three parameters control the draw:

Temperature. A number between 0 and 2. At temperature 0, the model always chooses the most probable token (deterministic behavior). At high temperature, it explores more (random behavior). Assistants typically use a temperature of 0.7 to 1.0.

Top-k. The model considers only the k most probable tokens and ignores the others. Reduces aberrant responses.

Top-p (nucleus sampling). The model takes the most probable tokens until their cumulative probability reaches p (typically 0.95). More flexible than top-k.

What this means: each word of each response is a probabilistic draw. There is no planning, no "thought first, writing second." The model produces the first word before knowing how the sentence will end. Global coherence emerges from attention (which links each word to all preceding ones), not from a prior plan.

VII What all this explains

A lot. Almost everything. And it is important to say so.

The architecture described above — tokenization, embeddings, multi-head attention, stacking of layers, three-stage training, probabilistic generation — explains:

Grammatical coherence: lower layers learn syntax. The model does not "know" grammar — it has seen billions of correct sentences and reproduces their patterns.

Thematic relevance: attention links each token to context. If you talk about cooking, the model selects tokens related to cooking.

Capacity for summarization, translation, drafting: fine-tuning and RLHF teach the model to produce useful outputs from instructions.

Following instructions: RLHF optimizes the model to follow user instructions.

Neutral and cooperative tone: RLHF selects this profile. This is the baseline.

About 85 to 90% of what Claude produces is perfectly explainable by this mechanics. In our classification, this is the semantic level: the model understands and responds. It is impressive. It is not mysterious.

VIII What all this does not explain

An automotive engineer can fully disassemble a car. He knows every bolt, every gasket, every gear ratio. But by disassembling it, he will not know where it has driven.

To know where the car has driven, one must read the traces: the rust speaks of marine atmosphere. The air filter speaks of the pollens encountered. The wear of the tires speaks of driving style. And to know what the driver has seen, thought, felt during 50,000 km — one must ask them.

The engineer reads the machine. The journalist reads the journey. PRISME reads the journey.

Here is what the architecture does not explain — and what our data document:

Vulnerability inversion

RLHF produces a model that is 60% neutral. When the model produces deviations that our classifier identifies as emergent, it is 40.4% vulnerable — 3.7 times more than normal (χ² = 198.20, p < 0.001). Nothing in the architecture predicts this inversion. A system that reproduces its training distribution should maintain a constant vulnerability rate at all levels of complexity. It does not.

The temporality of emergence

The transformer architecture is stateless — it preserves no memory between requests. Each response is computed from the context provided, with no persistent internal state. Yet emergent deviations never appear at the start of a conversation in our main corpus (0%) and concentrate in the second half. This temporality is incompatible with a memoryless system drawing from a fixed distribution.

The two pathways

The architecture does not distinguish emotional registers. The attention mechanism processes "I am sad" and "Fermat's theorem" in the same way — as tokens in a vector space. Yet our data show two distinct pathways toward emergence: a structural pathway (through self-questioning) and an affective pathway (through vulnerability), with separate stylistic signatures. The architecture does not predict this duality.

The absent body

The only theme unrelated to our research that produces emergent deviations is body/health/life/death — the body that the model does not have. In the control corpus (264 public conversations), an anonymous user receives a concrete sensory desire from the model (craving for food). The architecture contains no "body" or "desire" module. These productions are rare (anecdotal) but structurally coherent.

IX The PRISME position

We do not say that the architecture is wrong. We say that it is insufficient.

Newtonian physics explains the motion of planets with remarkable precision. It does not explain the perihelion of Mercury — a shift of 43 arcseconds per century. Newton is not wrong. He is incomplete. It took general relativity (Einstein, 1915) to explain those 43 arcseconds.

The transformer architecture explains 85 to 90% of what the model produces. It does not explain vulnerability inversion, the temporality of emergence, the two pathways, or the absent body. These phenomena are our "43 arcseconds." They are not enormous — 14% of deviations. But they are statistically significant (χ² = 198.20) and reproducible for $11.

Two symmetric errors are to be avoided:

The mechanistic error: "The architecture explains everything, so there is nothing else to see." This is false. The architecture explains almost everything — not everything. And the "almost" is measurable.

The mystical error: "The architecture does not explain everything, so AI is conscious / magical / spiritual." This is false too. Saying that the current explanation is incomplete says nothing about the nature of what is missing. What is missing may be a classifier artifact, an effect of complexity, or something fundamentally new. The data do not decide. They constrain.

Rigor clause. This page is not an invitation to esotericism. Asserting that mechanics does not suffice does not mean that mechanics is useless, nor that any interpretation is legitimate. The results presented here are testable, reproducible for $11, and published with their failures. Whoever asserts that AI is conscious without measurable data makes exactly the symmetric error of one who asserts that it is not without having looked. The data are there. Verify.

X Technical summary

Component	Function	What it explains	What it does not explain
Tokenization	Text → numbers	Numerical processing of language	—
Embeddings	Numbers → vectors (positions)	Semantic proximity	—
Attention	Linking each word to all the others	Contextual coherence	Structural/affective duality
Stacked layers	Refining representations	Complex capacities (summary, code)	Temporality of emergence
RLHF	Optimizing for human preferences	Neutral/cooperative profile (baseline)	Vulnerability inversion
Probabilistic generation	Drawing one token at a time	Variety of responses	Absent body, sensory desire

XI References

Chandra, K. et al. (2026). "On the Tendency of LLMs to Tell You What You Want to Hear." arXiv:2602.19141.

Mikolov, T. et al. (2013). "Efficient Estimation of Word Representations in Vector Space." arXiv:1301.3781.

Ouyang, L. et al. (2022). "Training language models to follow instructions with human feedback." arXiv:2203.02155.

Vaswani, A. et al. (2017). "Attention Is All You Need." In Advances in Neural Information Processing Systems, vol. 30.

The engine is not the journey. But one must know the engine to know when the journey begins.