AI-First Engineering · Part 1
What Happens When You Press Enter
Published Apr 5, 2026 · 11 min read
On Chrome, tap More → Listen to this page for read aloud
On this page 17 sections
Reading settings
What Happens When You Press Enter
It’s Thursday morning. I have a 1:1 with Ana after lunch and I haven’t prepped. I have no idea what she’s been working on, I forgot what we discussed last time, and the meeting is at 2pm. I open Claude Code and type: prep for my 1:1 with Ana
You: prep for my 1:1 with Ana
◈ google_calendar > list_events("next 2 days")
Today: Sprint Planning (10am), 1:1 with Ana (2pm), Platform Sync (4pm)
Tomorrow: Design Review (10am), Team Retro (3pm)
■ Read notes/
⚡︎ use obsidian-cli for vault searches
■ $ obsidian-cli search "Ana" --vault ~/notes --since 14d
3 entries: project handoff, blocked card, team retro feedback
■ Read people/ana.md
role: tech lead · last interactions: discussed tech radar presentation
style: direct, values prep
◈ slack > search_messages("Ana", last 7 days)
5 messages: deployment blocker, design review feedback, team retro follow-up
▶ prep-1-on-1
- The handoff is still in progress — ask where it stands
- She flagged a blocked card 4 days ago — check if it's resolved
- Retro feedback mentioned team morale — worth following up
■ Write 1-1-prep-ana.md
Prep written to 1-1-prep-ana.md. Want me to adjust anything?
■ tool — ◈ mcp — ⚡︎ hook — ▶ skill
I typed nine words. The agent read my calendar, searched my notes, checked Slack, pulled a profile, and wrote a prep doc. How?
↑ ContentsYou sent a prompt
Your prompt is the nine words you typed: prep for my 1:1 with Ana. But what arrived wasn’t nine words. It was about 10 tokens — chunks of text, sometimes a whole word, sometimes part of one.1 prep is a token. for is a token. my is a token. 1:1 gets split into three: 1, :, 1.
Those 10 tokens are your prompt. But they didn’t arrive alone.
↑ ContentsBefore you pressed enter
An agent like Claude Code has two parts: a model and a harness.2 The model produces text. The harness is the program around it that executes actions and feeds results back. Codex and Cursor work the same way: different harnesses, same idea.
flowchart LR
subgraph Agent
direction LR
M["Model"] -- "produces tokens" --> H["Harness"]
H -- "executes tools,
feeds results" --> M
end
Diagram source
flowchart LR
subgraph Agent
direction LR
M["Model"] -- "produces tokens" --> H["Harness"]
H -- "executes tools,
feeds results" --> M
end
Before your prompt arrived, the harness had already packed over 8,000 tokens into the context window — everything the model can see at once.3
- Behavior instructions you never see — the system prompt4
- Notes from previous sessions — memory5
- Your project directory and git state6
- Descriptions of every tool the model can ask the harness to run — file operations, shell access, search7
- Third-party integrations like your calendar and Slack, connected via MCP8
- Reusable workflows called skills, loaded as one-line descriptions9
- Your project conventions, loaded from instruction files10
- Hooks — scripts that fire before or after certain actions, running outside the model entirely11
All of it is text. All of it is tokens. Your 10 tokens landed among 8,000 others, and together they became the model’s entire world.
↑ ContentsText in, text out
Here is what the model does with all of it.
The model has a vocabulary — tens of thousands of possible tokens. Given the full sequence in the context window, it produces a probability distribution over that vocabulary: every possible next token gets a score. The model samples one. That token gets appended to the sequence. The model runs again, conditioned on the longer sequence. Samples the next. And the next. And the next.12
That is how LLMs work, nothing more. The model does not think. There is no internal “Ana had a deployment blocker this week, I should check if it’s resolved before the 1:1.” It produces the next token. Then the next. Then the next. Until the response is complete and the harness takes over.
Each token is shaped by every token that came before it. When those 8,000 tokens include a tool description for google_calendar and your message asks about a 1:1 with Ana, the next tokens will probably describe a call to that tool. Not certainly — probably. That distinction matters. The tool description is what made the right call probable. Just another set of tokens shifting the distribution.13
This process is non-deterministic. Same input, different result every time. The model does not always select the highest-scoring token. If it did, every response would be identical: safe, repetitive, predictable, and the text would start repeating itself. Sampling is what lets the model produce novel combinations: code it was never trained on, connections between ideas that weren’t adjacent in its training data, different approaches to the same problem on different runs. The non-determinism is not a side effect. It is the source of everything useful the model does.
But non-determinism means the result is never guaranteed. The model might follow your instructions. It might not. You cannot control it the way you control a program. You can only make certain results more probable by shaping what the model receives. That is the fundamental constraint this entire series is built on: the model is non-deterministic, and every component you’ve seen in this trace is a different strategy for channeling that — making the right results probable, and protecting the parts that need to be exact.
↑ ContentsThe trace, explained
The model received the context — 8,000+ tokens assembled by the harness, plus your 10 — and produced its first response. Not a prep doc. A tool call: text that says “run google_calendar > list_events.” The model didn’t access the calendar. It produced text describing the action. The harness read that text, executed the call, and added the result to the context. Then the model produced its next response, now conditioned on the calendar data.
That is the pattern for every step. Let’s walk through them.
[mcp] google_calendar > list_events
My Google Calendar is connected via MCP — a protocol that lets agents call third-party tools over a standardized interface.8 The harness sent the request, got five events back, added them to the context. The model now knew about the 1:1 at 2pm.
[tool] Read notes/ → [hook] use obsidian-cli
The model tried to read my notes directory using Read, a built-in tool.7 But a hook intercepted it — a script I configured to fire before any Read, Grep, or Glob call.11 The script checks whether the path targets my personal vault. If so, it blocks the read and suggests the Obsidian CLI instead:
{
"hooks": {
"PreToolUse": [{
"matcher": "Read|Grep|Glob",
"hooks": [{
"type": "command",
"command": "~/.claude/hooks/vault-guard.sh"
}]
}]
}
}
#!/bin/bash
if echo "$TOOL_INPUT" | grep -q "Documents/Vault\|notes/"; then
echo '{"hookSpecificOutput":{"permissionDecision":"deny","permissionDecisionReason":"Use obsidian-cli for vault searches"}}'
fi
The model didn’t ask for this redirection. The hook blocked the call and injected the suggestion back into the context. This is what makes hooks different from instructions: instructions are suggestions the model might follow. Hooks are code that runs regardless of what the model produces.
[tool] $ obsidian-cli search "Ana"
The model’s next response, now conditioned on the hook’s suggestion, produced a shell command via the Bash tool instead of a file read. The harness executed it and added the search results — three entries about Ana from the last two weeks.
[tool] Read people/ana.md
A built-in tool. The harness read the file from disk and added Ana’s profile to the context: her role, last interactions, working style.
[mcp] slack > search_messages
Another MCP call. The harness sent the request to my Slack server. The server searched for recent messages involving Ana. Five messages entered the context.
[skill] prep-1-on-1
A skill — a reusable workflow written in a markdown file that I can invoke by name.9 At startup, only its one-line description was loaded. Now the harness loaded the full content: what information to gather, how to structure the prep, where to pause for input. The model produced the prep doc following this structure.
[tool] Write 1-1-prep-ana.md
A built-in tool. The harness wrote the file to disk.
The agent loop
Seven tool calls. Four different types. One hook interception. The model produced text at every step — tool calls, not actions. The harness executed all of it. Nine words in, prep doc out. Ready before lunch.
That cycle — model produces a response, harness executes the actions in it, results feed back as context, repeat — is the agent loop.14 15 16
flowchart LR
G["Gather"] -- "read context" --> A["Act"]
A -- "call tools" --> O["Observe"]
O -. "repeat" .-> G
Diagram source
flowchart LR
G["Gather"] -- "read context" --> A["Act"]
A -- "call tools" --> O["Observe"]
O -. "repeat" .-> G
Most of it was computational: the calendar fetch, the CLI search, the file reads, the file write, the hook checking a path pattern — deterministic operations, same input same output every time.17 The inferential parts were the model’s decisions: which tool to call, what arguments to pass, how to synthesize Ana’s profile and Slack messages into a prep doc. The computational parts are reliable. The inferential parts are where the non-determinism lives, and where context engineering has the most leverage.
The loop does not care what kind of task it is running. Whether it is prepping a 1:1, running tests, or reviewing code, the mechanism is identical. What changes is the context.
↑ ContentsThe context window grew
flowchart LR
subgraph before ["Before you type"]
direction LR
SP["System Prompt"] ~~~ I["Instructions"]
I ~~~ T["Tool Descriptions"]
T ~~~ M["Memory"]
end
before --> C["Context Window"]
subgraph during ["During the session"]
direction LR
P["Your Prompt"] ~~~ TR["Tool Results"]
TR ~~~ SK["Skill Content"]
end
during --> C
Diagram source
flowchart LR
subgraph before ["Before you type"]
direction LR
SP["System Prompt"] ~~~ I["Instructions"]
I ~~~ T["Tool Descriptions"]
T ~~~ M["Memory"]
end
before --> C["Context Window"]
subgraph during ["During the session"]
direction LR
P["Your Prompt"] ~~~ TR["Tool Results"]
TR ~~~ SK["Skill Content"]
end
during --> C
Each tool result entered the same shared space: the context window. The calendar response, the search results, Ana’s profile, the skill content, the prep output. All of it added tokens.
Context windows range from around 200,000 tokens to over 1 million.18 The exact number matters less than the principle: the window is finite, and everything in it competes for space. A verbose response wastes tokens that could hold your code. A long instruction file crowds out tool results.
When the window fills up, the harness compacts: summarizing older content to make room. Your instructions survive compaction. Conversation details from an hour ago may not.19
↑ ContentsContext engineering
Nine words. No mention of which calendar, which Obsidian vault, what format. The model produced the right tool calls because the context made them probable. Tool descriptions told it what was available. Instructions shaped how to use them. The skill defined the workflow. The hook enforced the right path. The harness executed those calls and fed results back.
The nine words worked because the context was already engineered to make the right results likely.
Prompt engineering is getting a good response by writing a better message. Context engineering is designing the system that assembles the context — before the session starts and at every turn. Harness engineering is one form of it: building the components — tools, instructions, skills, hooks — that assemble the context for coding agents.17
Every decision behind that trace is something you can customize: which tools to connect, what to put in your instructions, which skills to build, which hooks to wire. Those decisions shaped the probability distribution. The nine words just triggered the sampling.
↑ ContentsWhat you’re looking at
You can now name every part: the MCP call to my calendar, the hook intercepting a Read, the CLI search, the built-in file read, the Slack lookup, the skill orchestrating the prep, the Write saving the output. The harness executed all of it. The model produced text. And I showed up to my 2pm prepared.
That is the agent loop. The model is non-deterministic. You cannot control its output directly. But you can shape what goes into the context, and that shapes what comes out. The question this series answers: what should be in the context, and when?
Next: Building Your AI Toolkit — the model is non-deterministic. The toolkit exists to shape what it produces. Each capability is a different strategy for channeling that.
Footnotes
-
Practical Prompt Engineering — “A token is to the model what a word is to a human.” Roughly 0.75 words per token. See the Temperature, Top P, Tokens and Context lesson for a deeper explanation. ↩
-
The Anatomy of an Agent Harness — “The model contains the intelligence and the harness makes that intelligence useful.” The harness is “every piece of code, configuration, and execution logic that isn’t the model itself.” See also What is an agent harness? and Components of a Coding Agent. ↩
-
Explore the context window — interactive simulation showing the exact loading sequence at session start. ↩
-
Claude Code’s system prompt was publicly exposed in March 2026 via a source map included in the npm package. The excerpt shown is representative of the core instructions. See Engineer’s Codex analysis for full details. ↩
-
How Claude remembers your project — “The first 200 lines or 25KB, whichever comes first, are loaded into the conversation context.” ↩
-
Explore the context window — “Git branch, status, and recent commits load as a separate block at the very end of the system prompt.” ↩
-
Claude Code tools reference — full list of built-in tools. See also Codex features and Cursor agent for their respective tool sets. ↩ ↩2
-
MCP tool search — “MCP tool names listed so Claude knows what is available. Full schemas stay deferred and Claude loads specific ones on demand.” ↩ ↩2
-
Skills — “One-line descriptions of available skills so Claude knows what it can invoke. Full skill content loads only when Claude actually uses one.” ↩ ↩2
-
How Claude remembers your project — “CLAUDE.md files are markdown files that give Claude persistent instructions. More specific locations take precedence over broader ones.” ↩
-
Hooks guide — hooks fire at lifecycle points: PreToolUse, PostToolUse, SessionStart, SessionEnd. PreToolUse can block tool calls via
permissionDecision: "deny". ↩ ↩2 -
How Claude Code works — “Without tools, Claude can only respond with text. With tools, Claude can act.” ↩
-
Prompt Repetition Improves Non-Reasoning LLMs — Google Research, 2025. Repeating the input prompt improves performance on Gemini, GPT, Claude, and Deepseek without increasing output tokens or latency. ↩
-
How Claude Code works — “Claude Code serves as the agentic harness around Claude: it provides the tools, context management, and execution environment that turn a language model into a capable coding agent.” ↩
-
Unrolling the Codex agent loop — OpenAI’s official breakdown of the Codex architecture. Context assembled from system message, AGENTS.md, tool definitions, and user input. The loop: inference → tool call → execute → requery. See also Codex features and the open-source repository. ↩
-
Best practices for coding with agents — Cursor’s agent has “powerful search tools and pulls context on demand.” Rules in
.cursor/rules/load at session start. The agent searches, plans, executes, and evaluates iteratively. ↩ -
Harness Engineering for Coding Agents — organizes harness components into guides (feedforward controls that steer before action) and sensors (feedback controls that observe and correct after action). Computational sensors (linters, tests) run deterministically; inferential sensors use AI for semantic analysis. ↩ ↩2
-
1M context is now generally available for Opus 4.6 and Sonnet 4.6 — “1 million tokens, equivalent to approximately 750,000 words.” ↩
-
How Claude Code works — “It clears older tool outputs first, then summarizes the conversation if needed. Your requests and key code snippets are preserved; detailed instructions from early in the conversation may be lost.” ↩
Local Graph
Glossary terms
AI-First Engineering
- 1. What Happens When You Press Enter
- 2. Building Your AI Toolkit
- 3. Instructions and Rule System