Skip to content

AI-First Engineering · Part 2

Building Your AI Toolkit

Published Mar 17, 2026 · 9 min read

ai-first-engineering rules skills mcp hooks

On Chrome, tap More → Listen to this page for read aloud

On this page 13 sections
Reading settings
18px
1.6

Building Your AI Toolkit

An agent is a loop around a language model. The model produces text. The loop turns that text into actions — tool calls, file edits, shell commands — and feeds the results back. What are rules, skills, hooks — and when do you use each one? Once that clicks, the rest follows.

↑ Contents

The agent loop

Anthropic describes the agent loop as three phases: gather context, take action, verify results.

Gather context Read files, search code, check state Take action Call tools, edit files, run commands Verify results Run tests, check output, read logs repeat
The agent loop: Gather context → Take action → Verify results → repeat — View diagram →

Here is what those three phases look like in practice, running a pre-PR pipeline on a branch with 24 changed files:

User: "let's wrap this up"

turn 1 (gather)
  git summary → branch · 24 files · 8 backend + 4 test · 2 commits

turn 2 (act — 4 parallel sub-agents)
  Lint        → lint-fix + lint → 0 errors · 0 warnings
  Unit        → lein test → 21 tests · 151 assertions → PASS
  Integration → lein integration → 5 tests · 6 assertions → PASS
  Code review → reads architecture rules + fetches story criteria
              → 2 warnings: missing validation, undocumented field

turn 3 (observe)
  ✅ Lint — clean
  ✅ Unit — 21 pass
  ✅ Integration — 5 pass
  ⚠️ Code review — 2 warnings
  ✅ Story — all criteria met

turn 4 (act → human gate)
  "UserProfile should use strict validation. Here's the fix: ..."
  [ Accept ] [ Dismiss ]

Four turns. The system ran 12 commands (lein is Clojure’s build tool), surfaced two warnings, and proposed a fix. Seven words of input.

Same loop, entirely different task:

User: "prep for my 1:1 with Ana"

turn 1 (gather)
  reads last 14 daily notes
  searches issue tracker for active cards
  reads Ana's person file (role, last topics, working style)

turn 2 (act)
  synthesizes into a relationship-first prep doc:
  - Recent context: what you've been working on
  - Team dynamics: anything relevant to surface
  - Growth angle: what's worth discussing for career development
  - Conversation openers: specific, not generic

turn 3 (act → human gate)
  "Here's your prep. Want me to adjust anything?"

Same loop. Same protocol. No new code.

To understand what’s happening in those traces, two terms to keep separate:

↑ Contents

What an agent actually is

  • Model — the language model. It receives text (context) and produces text (a response). That is all it does.
  • Agent — the model plus the loop around it. The loop feeds context to the model, interprets its response, executes any tool calls, and feeds the results back. The agent is the whole system: model, runtime, and tools.

An agent is a program that runs in a loop. The model receives context and produces a response. The runtime interprets that response. If it contains tool calls, the runtime executes them and feeds the results back as new context. The model produces the next response conditioned on that updated context. This repeats until the response contains no tool calls.

How the loop works: turns

The agent loop is made of turns. A turn is one round trip. In the pre-PR pipeline above: turn 1 produced a tool call for git summary, turn 2 dispatched four sub-agents in parallel, turn 3 synthesized their results, turn 4 presented a fix and paused for human input. Each turn follows the same steps:

  1. The model receives everything accumulated so far: the system prompt (its instructions), the conversation history, and the results of any tools called in the previous turn.
  2. The model produces a response. That response can contain text, one or more tool calls, or both.
  3. If the response contains tool calls, the runtime executes them and appends the results to the context. This starts a new turn at step 1.
  4. If the response contains no tool calls, the loop ends. The text is the final output.

Context System prompt, history, tool results, rules Model Produces response from context Response Text, tool calls, or both Runtime Executes tool calls, collects results if tool calls results become new context
The turn loop: Context flows to the model, the model produces tool calls, the runtime executes them, results feed back as new context — View diagram →

The loop does not care what kind of task it is running. Every turn is the same operation: the model receives context, produces a response, and the runtime executes any tool calls in it. Whether the task is running tests or drafting meeting prep, the mechanism is identical.

↑ Contents

The five building blocks

The agent is the loop.

Instructions and rules — the passive knowledge

Instructions are CLAUDE.md files that load into context automatically. They are not called like tools. They are loaded into the model’s context at session start. Your project CLAUDE.md describes conventions. An architecture doc describes your layering. A coding standards file describes naming conventions.

Claude Code also has a rules system: .claude/rules/ files that can be scoped to specific file paths using glob patterns. Both instructions and rules end up in the same place (the context window), but rules exist because instructions don’t scale. See Instructions and Rule System for the full breakdown.

A short instruction file silently prevents an entire class of mistakes. Put your architecture decisions in a CLAUDE.md once; every session that follows benefits from them without you repeating yourself.

Skills — the workflow layer

Skills are prompt files (SKILL.md) that define how a task should be executed. They are triggered by name (/pre-pr, /meeting-prep, /standup) or by matching user phrases. They orchestrate multi-step workflows, define what information to gather, and can specify where human checkpoints belong.

Instructions are loaded. Skills are invoked. Instructions shape every session passively. Skills activate for specific workflows — they specify what to execute, when to connect to external systems, and where to pause for human input. “When preparing for a 1:1, gather the last two weeks of notes, check the issue tracker for anything blocked, read the person file for context — then draft, don’t deliver.” That’s a skill.

Skills are the orchestration layer above CLI and MCP. This is what makes the pre-PR pipeline possible. The skill tells the agent to dispatch four sub-agents in parallel, each running CLI commands in the background, while the model outputs progress updates and synthesizes results. That parallelism, that async coordination: none of it is possible through MCP, which blocks on every call. Skills run inside the model’s context, which means the model can produce tool calls, spawn sub-agents, and adjust based on results, all within the same turn.

SKILL /pre-pr dispatch Lint CLI lint-fix + lint Unit tests CLI lein test Integration CLI lein integration Code review MCP search + read 4 sub-agents run in parallel — the skill narrates progress and synthesizes results
Skill orchestration: /pre-pr dispatches 4 sub-agents to CLI and MCP tools — View diagram →

Here’s the thing about skills: I didn’t write most of mine by hand. I described the workflow in conversation: “when I say /meeting-prep, gather the last two weeks of notes, check the issue tracker, read the person file.” The agent drafted the skill file. I refined it, tested it, iterated. The feedback loop is minutes: edit a markdown file, reload, try again. No server to build, no schema to validate, no deployment.

CLI — the deterministic anchor

CLI tools are predetermined operations. git commit commits. lein test runs tests. Same input, same output, every time.

The model’s response determines when to call them — but the command does the same thing regardless. This is the key: CLI is not “simple.” It’s deterministic. You use CLI when you need the operation to behave identically every time, regardless of context.

Not limited to code either. Fetch issues from your tracker, generate a standup, check your calendar. If it can run in a shell and return a predictable result, it belongs in CLI.

MCP — the cross-client protocol

MCP (Model Context Protocol) lets AI agents call external tools over a standardized JSON-RPC (Remote Procedure Call) interface. You define a server once; it works in Claude Code, Cursor, VS Code, and any other MCP-compatible client. Anthropic describes it as USB-C for AI.

Each MCP tool is self-describing — it carries its name, description, and parameter schema. These descriptions are loaded into context, and the model produces tool calls with structured parameters based on them.

Claude Code (MCP client) enterprise-search search, read_document (MCP server) internal-api get_customer, create_order (MCP server) JSON-RPC JSON-RPC
MCP architecture: client connects to servers over JSON-RPC — View diagram →

The first tradeoff is context cost. MCP tool responses are often verbose JSON. A single search result can return thousands of tokens of metadata that wastes context. Every token in a tool response competes with your conversation, your code, and your instructions for space in the context window. When using MCP, design your servers to return focused responses, not raw API dumps.

The second tradeoff is blocking. Every MCP tool call blocks the agent until it returns: no progress updates, no intermediate state, no parallelism in standard implementations. A CLI command can run in the background while the agent continues working. A skill can dispatch sub-agents that run concurrently. MCP can’t. The moment a workflow needs duration or parallelism, MCP breaks the experience.

Use MCP when you need cross-client portability, when the AI should discover available capabilities at runtime, or when you’re bridging to external systems with atomic, fast operations. Prefer CLI when you need a fast, predictable result without the context overhead. Prefer skills when you need orchestration, progress, or parallelism — skills orchestrate both CLI and MCP tools, which is the correct mental model: skills as the orchestration layer on top of both.

Hooks — the structural enforcer

Hooks fire at lifecycle points — before a tool runs, after it succeeds, when the session ends. A PreToolUse hook can block operations before they execute. A PostToolUse hook can redirect behavior after a tool call.

Agent Decides to call a tool PreToolUse Annotate · Block · Validate block allow + context pass Blocked Wrong command denied Annotated Context injected Tool still runs Tool call Bash · Write · Read · MCP result PostToolUse Observe · Log · Update UI Side effect Update terminal · Log event
Hooks lifecycle: Agent → PreToolUse (block / annotate / pass) → Tool call → PostToolUse → Side effect — View diagram →

The critical insight: instructions are context the model may not follow. Hooks are structural enforcement.

A CLAUDE.md instruction like “always run tests before committing” competes with everything else in the context window: the conversation, the code, the tool results. The longer the session, the more that instruction has to compete with. A hook doesn’t compete. It runs as code, outside the model’s context, on every tool call. The constraint doesn’t depend on what’s in the model’s context — the hook enforces it regardless.

Start with instructions for guidance. Escalate to hooks when you find an instruction that keeps getting ignored. And once you have hooks, you have an event layer: every tool call becomes an observable event you can log, validate, and react to. Hooks are the foundation for enforcement and observability.

The intelligence-determinism split

The five layers split along one axis:

NON-DETERMINISTIC — shapes thinking Rules Passive knowledge — absorbed not called Skills Workflow brain — invoked orchestrates Agent Plans, decides, calls tools shape direct DETERMINISTIC — executes and enforces Hooks Structural enforcement — intercepts tool calls Tools Bash, Write, Read, MCP, ... calls intercept
The intelligence-determinism split: Rules and Skills shape the Agent, which calls Tools that Hooks intercept — View diagram →

Skills sit above the deterministic layer — they orchestrate CLI and MCP tools, not alongside them. Instructions load passively into the agent’s context, shaping the model’s output when skills are active. Hooks intercept tool calls from below, enforcing constraints regardless of what the skill or model decided.

You don’t want the model’s output to determine whether to run your tests. You want tests to always run. CLI is the right layer. You do want the model’s output to determine how to frame your 1:1 prep based on context. Skills are the right layer.

The split also determines where to debug. If a CLI command returns the wrong result, debug the command. If the model produces an undesirable output, look at the skill or instruction that shaped the context.

↑ Contents

Why this architecture runs on markdown

Three of the five layers are just markdown files:

FileWhat it isWhat it also is
SKILL.mdHuman-readable guideAgent’s workflow instructions
CLAUDE.mdProject READMEAI’s identity file
RULES.mdCoding standardsPassive context loaded every session

This is not accidental. Markdown is the default format for AI agents for the same reason JSON became the default for web APIs: LLMs generate it natively. Their training data is dominated by GitHub READMEs, documentation, and Stack Overflow answers, all markdown. Humans can read and edit it without tooling.

The token economics reinforce it. Per Cloudflare’s data, a markdown heading costs ~3 tokens; the HTML equivalent costs 12-15, a 4-5x overhead. Across an entire instruction file or skill definition loaded into context, the savings compound. When context window space is your most constrained resource, format choices are architectural choices.

Markdown is simultaneously documentation and executable configuration. A SKILL.md is a human-readable guide that also serves as the agent’s workflow instructions. CLAUDE.md works the same way: it reads like a project README, but the model loads it as its identity file every session. No other format serves both constituencies. JSON configs are unreadable. YAML is not documentation.

If your AI system uses a format the model wasn’t trained on, you pay the cost in every token.

↑ Contents

The role shift: from doer to architect

What changes when you think about AI this way: the role shifts from doing the work to designing the workflow.

Three questions become your job:

  1. What information does this workflow need?
  2. What decisions require a human?
  3. Where are the repetitive steps the agent should handle?

I acted as architect, defining what each workflow needed, where the human gates belong, what information to gather. The agent implemented it. CLI, MCP, Skills, Rules, and Hooks are the building blocks. Markdown is the format they share.

Next in this series: Instructions and Rule System — how CLAUDE.md instructions and .claude/rules/ actually work, and why I got the terminology wrong in the original version of this post.

Referenced by

posts
Explore full graph →
  • Model

    The language model component inside an AI agent that receives text context and produces text responses, including tool calls.

  • Tool Call

    When the model asks the harness to do something — like read a file, run a command, or check a calendar. The model describes the action in text; the harness executes it.

  • Rules

    Path-scoped markdown files in the .claude/rules/ directory that provide modular, conditional instructions to Claude Code based on which files the model is working with.

  • Skills

    SKILL.md files that define how a specific task should be executed, invoked by name rather than loaded passively.

  • Hooks

    User-defined shell commands that execute at specific lifecycle points in Claude Code, providing deterministic enforcement outside the model's context.

  • Context

    The input the model receives on a given turn, including the system prompt, conversation history, and tool results from previous turns.

  • Runtime

    The component of an AI agent that executes tool calls produced by the model and manages the interaction loop.

  • Turn

    One round trip inside an agent loop where the model produces tool calls, the runtime executes them, and results feed back as new messages.

  • Subagent

    A fresh agent instance dispatched by a parent agent to execute an isolated task with clean context.

  • System Prompt

    The initial set of instructions injected before any user message in an LLM conversation that shapes the model's behavior.

  • Instructions

    What Anthropic calls CLAUDE.md files: plain markdown files loaded into the context window at session start to give the model persistent context.

  • CLAUDE.md

    The file format for instructions in Claude Code, loaded into the context window at session start to give the model persistent context.

  • Claude Code

    Anthropic's official agentic coding tool that runs in the terminal and operates through tools, hooks, and CLAUDE.md configuration files.

  • Context Window

    The maximum amount of text a language model can process in a single inference, measured in tokens.

  • Prompt

    Text input provided to a language model that conditions its response, including user messages, system prompts, and any other context.

  • Workflow

    A defined sequence of steps executed to accomplish a task, where the output of one step feeds the input of the next.

  • Session

    A single continuous interaction between a user and an AI coding agent, from launch to exit.

  • CLI

    A command-line interface is a text-based interface for interacting with software through typed commands.

  • MCP

    Model Context Protocol, an open standard JSON-RPC interface that lets AI agents discover and call external tools through a unified protocol.

  • Markdown

    A lightweight markup language that uses plain text formatting conventions to produce structured documents, readable without rendering.

  • JSON-RPC

    A lightweight remote procedure call protocol encoded in JSON, where a client sends a method name and parameters and the server returns a result.

  • JSON

    JavaScript Object Notation, a lightweight text-based data interchange format using key-value pairs and arrays that is both human-readable and machine-parseable.

  • Tokens

    The atomic units a language model processes, produced by splitting text into words, subwords, or punctuation marks before inference.

  • Lifecycle

    The ordered sequence of events from the start to the end of a process, defining fixed points where behavior can be attached or observed.

  • Tool

    An action the harness can execute when the model asks: read a file, run a command, search for something.

  • PreToolUse

    A Claude Code hook event that fires before every tool call is executed, capable of blocking, annotating, or passing the call through.

  • PostToolUse

    A Claude Code hook event that fires after a tool call completes successfully, receiving the tool name, parameters, and result.

  • LLM

    Large Language Model — a program that learned patterns from massive amounts of text and uses them to produce new text, one word (token) at a time.

  • Human Gate

    A point in an agent loop where the model pauses and waits for human input before continuing.

  1. 1. What Happens When You Press Enter
  2. 2. Building Your AI Toolkit
  3. 3. Instructions and Rule System