AI-First Engineering · Part 1

What Happens When You Press Enter

Published Apr 5, 2026 · 11 min read

ai-first-engineering context-engineering agent-loop agent-harness

View source

On Chrome, tap More → Listen to this page for read aloud

On this page 17 sections

Reading settings

Body font

Code font

Font size18px

Line spacing1.6

What Happens When You Press Enter

It’s Thursday morning. I have a 1:1 with Ana after lunch and I haven’t prepped. I have no idea what she’s been working on, I forgot what we discussed last time, and the meeting is at 2pm. I open Claude Code and type: prep for my 1:1 with Ana

You: prep for my 1:1 with Ana

  ◈ google_calendar > list_events("next 2 days")
    Today: Sprint Planning (10am), 1:1 with Ana (2pm), Platform Sync (4pm)
    Tomorrow: Design Review (10am), Team Retro (3pm)

  ■ Read notes/
  ⚡︎ use obsidian-cli for vault searches

  ■ $ obsidian-cli search "Ana" --vault ~/notes --since 14d
    3 entries: project handoff, blocked card, team retro feedback

  ■ Read people/ana.md
    role: tech lead · last interactions: discussed tech radar presentation
    style: direct, values prep

  ◈ slack > search_messages("Ana", last 7 days)
    5 messages: deployment blocker, design review feedback, team retro follow-up

  ▶ prep-1-on-1
    - The handoff is still in progress — ask where it stands
    - She flagged a blocked card 4 days ago — check if it's resolved
    - Retro feedback mentioned team morale — worth following up

  ■ Write 1-1-prep-ana.md

  Prep written to 1-1-prep-ana.md. Want me to adjust anything?

■ tool — ◈ mcp — ⚡︎ hook — ▶ skill

I typed nine words. The agent read my calendar, searched my notes, checked Slack, pulled a profile, and wrote a prep doc. How?

↑ Contents

You sent a prompt

Your prompt is the nine words you typed: prep for my 1:1 with Ana. But what arrived wasn’t nine words. It was about 10 tokens — chunks of text, sometimes a whole word, sometimes part of one.¹ prep is a token. for is a token. my is a token. 1:1 gets split into three: 1, :, 1.

Those 10 tokens are your prompt. But they didn’t arrive alone.

↑ Contents

Before you pressed enter

An agent like Claude Code has two parts: a model and a harness.² The model produces text. The harness is the program around it that executes actions and feeds results back. Codex and Cursor work the same way: different harnesses, same idea.

flowchart LR
    subgraph Agent
        direction LR
        M["Model"] -- "produces tokens" --> H["Harness"]
        H -- "executes tools, 
        feeds results" --> M
    end

Diagram source

flowchart LR
    subgraph Agent
        direction LR
        M["Model"] -- "produces tokens" --> H["Harness"]
        H -- "executes tools, 
        feeds results" --> M
    end

Before your prompt arrived, the harness had already packed over 8,000 tokens into the context window — everything the model can see at once.³

Behavior instructions you never see — the system prompt⁴
Notes from previous sessions — memory⁵
Your project directory and git state⁶
Descriptions of every tool the model can ask the harness to run — file operations, shell access, search⁷
Third-party integrations like your calendar and Slack, connected via MCP⁸
Reusable workflows called skills, loaded as one-line descriptions⁹
Your project conventions, loaded from instruction files¹⁰
Hooks — scripts that fire before or after certain actions, running outside the model entirely¹¹

All of it is text. All of it is tokens. Your 10 tokens landed among 8,000 others, and together they became the model’s entire world.

↑ Contents

Text in, text out

Here is what the model does with all of it.

The model has a vocabulary — tens of thousands of possible tokens. Given the full sequence in the context window, it produces a probability distribution over that vocabulary: every possible next token gets a score. The model samples one. That token gets appended to the sequence. The model runs again, conditioned on the longer sequence. Samples the next. And the next. And the next.¹²

That is how LLMs work, nothing more. The model does not think. There is no internal “Ana had a deployment blocker this week, I should check if it’s resolved before the 1:1.” It produces the next token. Then the next. Then the next. Until the response is complete and the harness takes over.

Each token is shaped by every token that came before it. When those 8,000 tokens include a tool description for google_calendar and your message asks about a 1:1 with Ana, the next tokens will probably describe a call to that tool. Not certainly — probably. That distinction matters. The tool description is what made the right call probable. Just another set of tokens shifting the distribution.¹³

This process is non-deterministic. Same input, different result every time. The model does not always select the highest-scoring token. If it did, every response would be identical: safe, repetitive, predictable, and the text would start repeating itself. Sampling is what lets the model produce novel combinations: code it was never trained on, connections between ideas that weren’t adjacent in its training data, different approaches to the same problem on different runs. The non-determinism is not a side effect. It is the source of everything useful the model does.

But non-determinism means the result is never guaranteed. The model might follow your instructions. It might not. You cannot control it the way you control a program. You can only make certain results more probable by shaping what the model receives. That is the fundamental constraint this entire series is built on: the model is non-deterministic, and every component you’ve seen in this trace is a different strategy for channeling that — making the right results probable, and protecting the parts that need to be exact.

↑ Contents

The trace, explained

The model received the context — 8,000+ tokens assembled by the harness, plus your 10 — and produced its first response. Not a prep doc. A tool call: text that says “run google_calendar > list_events.” The model didn’t access the calendar. It produced text describing the action. The harness read that text, executed the call, and added the result to the context. Then the model produced its next response, now conditioned on the calendar data.

That is the pattern for every step. Let’s walk through them.

[mcp] `google_calendar > list_events`

My Google Calendar is connected via MCP — a protocol that lets agents call third-party tools over a standardized interface.⁸ The harness sent the request, got five events back, added them to the context. The model now knew about the 1:1 at 2pm.

[tool] `Read notes/` → [hook] `use obsidian-cli`

The model tried to read my notes directory using Read, a built-in tool.⁷ But a hook intercepted it — a script I configured to fire before any Read, Grep, or Glob call.¹¹ The script checks whether the path targets my personal vault. If so, it blocks the read and suggests the Obsidian CLI instead:

{
  "hooks": {
    "PreToolUse": [{
      "matcher": "Read|Grep|Glob",
      "hooks": [{
        "type": "command",
        "command": "~/.claude/hooks/vault-guard.sh"
      }]
    }]
  }
}

#!/bin/bash
if echo "$TOOL_INPUT" | grep -q "Documents/Vault\|notes/"; then
  echo '{"hookSpecificOutput":{"permissionDecision":"deny","permissionDecisionReason":"Use obsidian-cli for vault searches"}}'
fi

The model didn’t ask for this redirection. The hook blocked the call and injected the suggestion back into the context. This is what makes hooks different from instructions: instructions are suggestions the model might follow. Hooks are code that runs regardless of what the model produces.

[tool] `$ obsidian-cli search "Ana"`

The model’s next response, now conditioned on the hook’s suggestion, produced a shell command via the Bash tool instead of a file read. The harness executed it and added the search results — three entries about Ana from the last two weeks.

[tool] `Read people/ana.md`

A built-in tool. The harness read the file from disk and added Ana’s profile to the context: her role, last interactions, working style.

[mcp] `slack > search_messages`

Another MCP call. The harness sent the request to my Slack server. The server searched for recent messages involving Ana. Five messages entered the context.

[skill] `prep-1-on-1`

A skill — a reusable workflow written in a markdown file that I can invoke by name.⁹ At startup, only its one-line description was loaded. Now the harness loaded the full content: what information to gather, how to structure the prep, where to pause for input. The model produced the prep doc following this structure.

[tool] `Write 1-1-prep-ana.md`

A built-in tool. The harness wrote the file to disk.

The agent loop

Seven tool calls. Four different types. One hook interception. The model produced text at every step — tool calls, not actions. The harness executed all of it. Nine words in, prep doc out. Ready before lunch.

That cycle — model produces a response, harness executes the actions in it, results feed back as context, repeat — is the agent loop.¹⁴ ¹⁵ ¹⁶

flowchart LR
    G["Gather"] -- "read context" --> A["Act"]
    A -- "call tools" --> O["Observe"]
    O -. "repeat" .-> G

Diagram source

flowchart LR
    G["Gather"] -- "read context" --> A["Act"]
    A -- "call tools" --> O["Observe"]
    O -. "repeat" .-> G

Most of it was computational: the calendar fetch, the CLI search, the file reads, the file write, the hook checking a path pattern — deterministic operations, same input same output every time.¹⁷ The inferential parts were the model’s decisions: which tool to call, what arguments to pass, how to synthesize Ana’s profile and Slack messages into a prep doc. The computational parts are reliable. The inferential parts are where the non-determinism lives, and where context engineering has the most leverage.

The loop does not care what kind of task it is running. Whether it is prepping a 1:1, running tests, or reviewing code, the mechanism is identical. What changes is the context.

↑ Contents

The context window grew

flowchart LR
    subgraph before ["Before you type"]
        direction LR
        SP["System Prompt"] ~~~ I["Instructions"]
        I ~~~ T["Tool Descriptions"]
        T ~~~ M["Memory"]
    end
    before --> C["Context Window"]
    subgraph during ["During the session"]
        direction LR
        P["Your Prompt"] ~~~ TR["Tool Results"]
        TR ~~~ SK["Skill Content"]
    end
    during --> C

Diagram source

flowchart LR
    subgraph before ["Before you type"]
        direction LR
        SP["System Prompt"] ~~~ I["Instructions"]
        I ~~~ T["Tool Descriptions"]
        T ~~~ M["Memory"]
    end
    before --> C["Context Window"]
    subgraph during ["During the session"]
        direction LR
        P["Your Prompt"] ~~~ TR["Tool Results"]
        TR ~~~ SK["Skill Content"]
    end
    during --> C

Each tool result entered the same shared space: the context window. The calendar response, the search results, Ana’s profile, the skill content, the prep output. All of it added tokens.

Context windows range from around 200,000 tokens to over 1 million.¹⁸ The exact number matters less than the principle: the window is finite, and everything in it competes for space. A verbose response wastes tokens that could hold your code. A long instruction file crowds out tool results.

When the window fills up, the harness compacts: summarizing older content to make room. Your instructions survive compaction. Conversation details from an hour ago may not.¹⁹

↑ Contents

Context engineering

Nine words. No mention of which calendar, which Obsidian vault, what format. The model produced the right tool calls because the context made them probable. Tool descriptions told it what was available. Instructions shaped how to use them. The skill defined the workflow. The hook enforced the right path. The harness executed those calls and fed results back.

The nine words worked because the context was already engineered to make the right results likely.

Prompt engineering is getting a good response by writing a better message. Context engineering is designing the system that assembles the context — before the session starts and at every turn. Harness engineering is one form of it: building the components — tools, instructions, skills, hooks — that assemble the context for coding agents.¹⁷

Every decision behind that trace is something you can customize: which tools to connect, what to put in your instructions, which skills to build, which hooks to wire. Those decisions shaped the probability distribution. The nine words just triggered the sampling.

↑ Contents

What you’re looking at

You can now name every part: the MCP call to my calendar, the hook intercepting a Read, the CLI search, the built-in file read, the Slack lookup, the skill orchestrating the prep, the Write saving the output. The harness executed all of it. The model produced text. And I showed up to my 2pm prepared.

That is the agent loop. The model is non-deterministic. You cannot control its output directly. But you can shape what goes into the context, and that shapes what comes out. The question this series answers: what should be in the context, and when?

Next: Building Your AI Toolkit — the model is non-deterministic. The toolkit exists to shape what it produces. Each capability is a different strategy for channeling that.

↑ Contents

Practical Prompt Engineering — “A token is to the model what a word is to a human.” Roughly 0.75 words per token. See the Temperature, Top P, Tokens and Context lesson for a deeper explanation. ↩
The Anatomy of an Agent Harness — “The model contains the intelligence and the harness makes that intelligence useful.” The harness is “every piece of code, configuration, and execution logic that isn’t the model itself.” See also What is an agent harness? and Components of a Coding Agent. ↩
Explore the context window — interactive simulation showing the exact loading sequence at session start. ↩
Claude Code’s system prompt was publicly exposed in March 2026 via a source map included in the npm package. The excerpt shown is representative of the core instructions. See Engineer’s Codex analysis for full details. ↩
How Claude remembers your project — “The first 200 lines or 25KB, whichever comes first, are loaded into the conversation context.” ↩
Explore the context window — “Git branch, status, and recent commits load as a separate block at the very end of the system prompt.” ↩
Claude Code tools reference — full list of built-in tools. See also Codex features and Cursor agent for their respective tool sets. ↩ ↩²
MCP tool search — “MCP tool names listed so Claude knows what is available. Full schemas stay deferred and Claude loads specific ones on demand.” ↩ ↩²
Skills — “One-line descriptions of available skills so Claude knows what it can invoke. Full skill content loads only when Claude actually uses one.” ↩ ↩²
How Claude remembers your project — “CLAUDE.md files are markdown files that give Claude persistent instructions. More specific locations take precedence over broader ones.” ↩
Hooks guide — hooks fire at lifecycle points: PreToolUse, PostToolUse, SessionStart, SessionEnd. PreToolUse can block tool calls via permissionDecision: "deny". ↩ ↩²
How Claude Code works — “Without tools, Claude can only respond with text. With tools, Claude can act.” ↩
Prompt Repetition Improves Non-Reasoning LLMs — Google Research, 2025. Repeating the input prompt improves performance on Gemini, GPT, Claude, and Deepseek without increasing output tokens or latency. ↩
How Claude Code works — “Claude Code serves as the agentic harness around Claude: it provides the tools, context management, and execution environment that turn a language model into a capable coding agent.” ↩
Unrolling the Codex agent loop — OpenAI’s official breakdown of the Codex architecture. Context assembled from system message, AGENTS.md, tool definitions, and user input. The loop: inference → tool call → execute → requery. See also Codex features and the open-source repository. ↩
Best practices for coding with agents — Cursor’s agent has “powerful search tools and pulls context on demand.” Rules in .cursor/rules/ load at session start. The agent searches, plans, executes, and evaluates iteratively. ↩
Harness Engineering for Coding Agents — organizes harness components into guides (feedforward controls that steer before action) and sensors (feedback controls that observe and correct after action). Computational sensors (linters, tests) run deterministically; inferential sensors use AI for semantic analysis. ↩ ↩²
1M context is now generally available for Opus 4.6 and Sonnet 4.6 — “1 million tokens, equivalent to approximately 750,000 words.” ↩
How Claude Code works — “It clears older tool outputs first, then summarizes the conversation if needed. Your requests and key code snippets are preserved; detailed instructions from early in the conversation may be lost.” ↩

Local Graph

posts

Explore full graph →

Glossary terms

Claude Code
Anthropic's official agentic coding tool that runs in the terminal and operates through tools, hooks, and CLAUDE.md configuration files.
Prompt
Text input provided to a language model that conditions its response, including user messages, system prompts, and any other context.
Tokens
The atomic units a language model processes, produced by splitting text into words, subwords, or punctuation marks before inference.
AI Agent
An autonomous system that perceives its environment, plans across multiple steps, acts using tools, observes results, and self-corrects in a loop.
Model
The language model component inside an AI agent that receives text context and produces text responses, including tool calls.
Agentic Harness
The program around the model that reads its responses, executes the actions described in them (reading files, running commands, calling APIs), and feeds the results back.
Context Window
The maximum amount of text a language model can process in a single inference, measured in tokens.
System Prompt
The initial set of instructions injected before any user message in an LLM conversation that shapes the model's behavior.
Auto Memory
Notes an AI coding agent writes to itself across sessions, loaded automatically at the start of each new session.
Tool
An action the harness can execute when the model asks: read a file, run a command, search for something.
MCP
Model Context Protocol, an open standard JSON-RPC interface that lets AI agents discover and call external tools through a unified protocol.
Skills
SKILL.md files that define how a specific task should be executed, invoked by name rather than loaded passively.
Instructions
What Anthropic calls CLAUDE.md files: plain markdown files loaded into the context window at session start to give the model persistent context.
Hooks
User-defined shell commands that execute at specific lifecycle points in Claude Code, providing deterministic enforcement outside the model's context.
Vocabulary
The fixed set of all tokens a language model can read and produce.
Probability Distribution
A set of scores assigned to every possible next token, representing how likely each one is to come next given everything before it.
LLM
Large Language Model — a program that learned patterns from massive amounts of text and uses them to produce new text, one word (token) at a time.
Tool Call
When the model asks the harness to do something — like read a file, run a command, or check a calendar. The model describes the action in text; the harness executes it.
Agent Loop
The cycle where a model produces a response, the harness executes tool calls, results feed back as context, and the loop repeats until no tool calls remain.
Compaction
The process of summarizing older context to free tokens when the context window fills up.
Prompt Engineering
The practice of crafting a single message to get a better response from a language model.
Context Engineering
The practice of designing what a language model sees across an entire session, not just a single message.

What Happens When You Press Enter

You sent a prompt

Before you pressed enter

Text in, text out

The trace, explained

[mcp] google_calendar > list_events

[tool] Read notes/ → [hook] use obsidian-cli

[tool] $ obsidian-cli search "Ana"

[tool] Read people/ana.md

[mcp] slack > search_messages

[skill] prep-1-on-1

[tool] Write 1-1-prep-ana.md

The agent loop

The context window grew

Context engineering

What you’re looking at

Footnotes

Local Graph

Glossary terms

AI-First Engineering

[mcp] `google_calendar > list_events`

[tool] `Read notes/` → [hook] `use obsidian-cli`

[tool] `$ obsidian-cli search "Ana"`

[tool] `Read people/ana.md`

[mcp] `slack > search_messages`

[skill] `prep-1-on-1`

[tool] `Write 1-1-prep-ana.md`