AI-First Engineering · Parte 2
Building Your AI Toolkit
Publicado em 17 de mar. de 2026 · 9 min de leitura
No Chrome, toque em Mais → Ouvir esta página para ouvir o post
On this page 13 sections
Configurações de leitura
Building Your AI Toolkit
An agent is a loop around a language model. The model produces text. The loop turns that text into actions — tool calls, file edits, shell commands — and feeds the results back. What are rules, skills, hooks — and when do you use each one? Once that clicks, the rest follows.
↑ ContentsThe agent loop
Anthropic describes the agent loop as three phases: gather context, take action, verify results.
Here is what those three phases look like in practice, running a pre-PR pipeline on a branch with 24 changed files:
User: "let's wrap this up"
turn 1 (gather)
git summary → branch · 24 files · 8 backend + 4 test · 2 commits
turn 2 (act — 4 parallel sub-agents)
Lint → lint-fix + lint → 0 errors · 0 warnings
Unit → lein test → 21 tests · 151 assertions → PASS
Integration → lein integration → 5 tests · 6 assertions → PASS
Code review → reads architecture rules + fetches story criteria
→ 2 warnings: missing validation, undocumented field
turn 3 (observe)
✅ Lint — clean
✅ Unit — 21 pass
✅ Integration — 5 pass
⚠️ Code review — 2 warnings
✅ Story — all criteria met
turn 4 (act → human gate)
"UserProfile should use strict validation. Here's the fix: ..."
[ Accept ] [ Dismiss ]
Four turns. The system ran 12 commands (lein is Clojure’s build tool),
surfaced two warnings, and proposed a fix. Seven words of input.
Same loop, entirely different task:
User: "prep for my 1:1 with Ana"
turn 1 (gather)
reads last 14 daily notes
searches issue tracker for active cards
reads Ana's person file (role, last topics, working style)
turn 2 (act)
synthesizes into a relationship-first prep doc:
- Recent context: what you've been working on
- Team dynamics: anything relevant to surface
- Growth angle: what's worth discussing for career development
- Conversation openers: specific, not generic
turn 3 (act → human gate)
"Here's your prep. Want me to adjust anything?"
Same loop. Same protocol. No new code.
To understand what’s happening in those traces, two terms to keep separate:
↑ ContentsWhat an agent actually is
- Model — the language model. It receives text (context) and produces text (a response). That is all it does.
- Agent — the model plus the loop around it. The loop feeds context to the model, interprets its response, executes any tool calls, and feeds the results back. The agent is the whole system: model, runtime, and tools.
An agent is a program that runs in a loop. The model receives context and produces a response. The runtime interprets that response. If it contains tool calls, the runtime executes them and feeds the results back as new context. The model produces the next response conditioned on that updated context. This repeats until the response contains no tool calls.
How the loop works: turns
The agent loop is made of turns. A turn is one round trip. In
the pre-PR pipeline above: turn 1 produced a tool call for git summary, turn 2
dispatched four sub-agents in parallel, turn 3 synthesized their results, turn 4
presented a fix and paused for human input. Each turn follows the same steps:
- The model receives everything accumulated so far: the system prompt (its instructions), the conversation history, and the results of any tools called in the previous turn.
- The model produces a response. That response can contain text, one or more tool calls, or both.
- If the response contains tool calls, the runtime executes them and appends the results to the context. This starts a new turn at step 1.
- If the response contains no tool calls, the loop ends. The text is the final output.
The loop does not care what kind of task it is running. Every turn is the same operation: the model receives context, produces a response, and the runtime executes any tool calls in it. Whether the task is running tests or drafting meeting prep, the mechanism is identical.
↑ ContentsThe five building blocks
The agent is the loop.
Instructions and rules — the passive knowledge
Instructions are CLAUDE.md files that load into context automatically. They are not called like tools. They are loaded into the model’s context at session start. Your project CLAUDE.md describes conventions. An architecture doc describes your layering. A coding standards file describes naming conventions.
Claude Code also has a rules system: .claude/rules/ files that can be scoped to specific file paths using glob patterns. Both instructions and rules end up in the same place (the context window), but rules exist because instructions don’t scale. See Instructions and Rule System for the full breakdown.
A short instruction file silently prevents an entire class of mistakes. Put your architecture decisions in a CLAUDE.md once; every session that follows benefits from them without you repeating yourself.
Skills — the workflow layer
Skills are prompt files (SKILL.md) that define how a task
should be executed. They are triggered by name (/pre-pr, /meeting-prep,
/standup) or by matching user phrases. They orchestrate multi-step
workflows, define what information to gather, and can
specify where human checkpoints belong.
Instructions are loaded. Skills are invoked. Instructions shape every session passively. Skills activate for specific workflows — they specify what to execute, when to connect to external systems, and where to pause for human input. “When preparing for a 1:1, gather the last two weeks of notes, check the issue tracker for anything blocked, read the person file for context — then draft, don’t deliver.” That’s a skill.
Skills are the orchestration layer above CLI and MCP. This is what makes the pre-PR pipeline possible. The skill tells the agent to dispatch four sub-agents in parallel, each running CLI commands in the background, while the model outputs progress updates and synthesizes results. That parallelism, that async coordination: none of it is possible through MCP, which blocks on every call. Skills run inside the model’s context, which means the model can produce tool calls, spawn sub-agents, and adjust based on results, all within the same turn.
Here’s the thing about skills: I didn’t write most of mine by hand. I described
the workflow in conversation: “when I say /meeting-prep, gather the last two
weeks of notes, check the issue tracker, read the person file.” The agent
drafted the skill file. I refined it, tested it, iterated. The feedback loop is
minutes: edit a markdown file, reload, try again. No
server to build, no schema to validate, no deployment.
CLI — the deterministic anchor
CLI tools are predetermined operations. git commit commits. lein test runs
tests. Same input, same output, every time.
The model’s response determines when to call them — but the command does the same thing regardless. This is the key: CLI is not “simple.” It’s deterministic. You use CLI when you need the operation to behave identically every time, regardless of context.
Not limited to code either. Fetch issues from your tracker, generate a standup, check your calendar. If it can run in a shell and return a predictable result, it belongs in CLI.
MCP — the cross-client protocol
MCP (Model Context Protocol) lets AI agents call external tools over a standardized JSON-RPC (Remote Procedure Call) interface. You define a server once; it works in Claude Code, Cursor, VS Code, and any other MCP-compatible client. Anthropic describes it as USB-C for AI.
Each MCP tool is self-describing — it carries its name, description, and parameter schema. These descriptions are loaded into context, and the model produces tool calls with structured parameters based on them.
The first tradeoff is context cost. MCP tool responses are often verbose JSON. A single search result can return thousands of tokens of metadata that wastes context. Every token in a tool response competes with your conversation, your code, and your instructions for space in the context window. When using MCP, design your servers to return focused responses, not raw API dumps.
The second tradeoff is blocking. Every MCP tool call blocks the agent until it returns: no progress updates, no intermediate state, no parallelism in standard implementations. A CLI command can run in the background while the agent continues working. A skill can dispatch sub-agents that run concurrently. MCP can’t. The moment a workflow needs duration or parallelism, MCP breaks the experience.
Use MCP when you need cross-client portability, when the AI should discover available capabilities at runtime, or when you’re bridging to external systems with atomic, fast operations. Prefer CLI when you need a fast, predictable result without the context overhead. Prefer skills when you need orchestration, progress, or parallelism — skills orchestrate both CLI and MCP tools, which is the correct mental model: skills as the orchestration layer on top of both.
Hooks — the structural enforcer
Hooks fire at lifecycle points — before a tool runs, after it succeeds, when the session ends. A PreToolUse hook can block operations before they execute. A PostToolUse hook can redirect behavior after a tool call.
The critical insight: instructions are context the model may not follow. Hooks are structural enforcement.
A CLAUDE.md instruction like “always run tests before committing” competes
with everything else in the context window: the conversation, the code, the tool
results. The longer the session, the more that instruction has to compete with.
A hook doesn’t compete. It runs as code, outside the model’s context, on every
tool call. The constraint doesn’t depend on what’s in the model’s context — the
hook enforces it regardless.
Start with instructions for guidance. Escalate to hooks when you find an instruction that keeps getting ignored. And once you have hooks, you have an event layer: every tool call becomes an observable event you can log, validate, and react to. Hooks are the foundation for enforcement and observability.
The intelligence-determinism split
The five layers split along one axis:
Skills sit above the deterministic layer — they orchestrate CLI and MCP tools, not alongside them. Instructions load passively into the agent’s context, shaping the model’s output when skills are active. Hooks intercept tool calls from below, enforcing constraints regardless of what the skill or model decided.
You don’t want the model’s output to determine whether to run your tests. You want tests to always run. CLI is the right layer. You do want the model’s output to determine how to frame your 1:1 prep based on context. Skills are the right layer.
The split also determines where to debug. If a CLI command returns the wrong result, debug the command. If the model produces an undesirable output, look at the skill or instruction that shaped the context.
↑ ContentsWhy this architecture runs on markdown
Three of the five layers are just markdown files:
| File | What it is | What it also is |
|---|---|---|
SKILL.md | Human-readable guide | Agent’s workflow instructions |
CLAUDE.md | Project README | AI’s identity file |
RULES.md | Coding standards | Passive context loaded every session |
This is not accidental. Markdown is the default format for AI agents for the same reason JSON became the default for web APIs: LLMs generate it natively. Their training data is dominated by GitHub READMEs, documentation, and Stack Overflow answers, all markdown. Humans can read and edit it without tooling.
The token economics reinforce it. Per Cloudflare’s data, a markdown heading costs ~3 tokens; the HTML equivalent costs 12-15, a 4-5x overhead. Across an entire instruction file or skill definition loaded into context, the savings compound. When context window space is your most constrained resource, format choices are architectural choices.
Markdown is simultaneously documentation and executable configuration. A
SKILL.md is a human-readable guide that also serves as the agent’s workflow
instructions. CLAUDE.md works the same way: it reads like a project README,
but the model loads it as its identity file every session. No other format
serves both constituencies. JSON configs are unreadable. YAML is not
documentation.
If your AI system uses a format the model wasn’t trained on, you pay the cost in every token.
↑ ContentsThe role shift: from doer to architect
What changes when you think about AI this way: the role shifts from doing the work to designing the workflow.
Three questions become your job:
- What information does this workflow need?
- What decisions require a human?
- Where are the repetitive steps the agent should handle?
I acted as architect, defining what each workflow needed, where the human gates belong, what information to gather. The agent implemented it. CLI, MCP, Skills, Rules, and Hooks are the building blocks. Markdown is the format they share.
Next in this series: Instructions and Rule System — how CLAUDE.md instructions and .claude/rules/ actually work, and why I got the terminology wrong in the original version of this post.
Referenced by
Local Graph
Glossary terms
AI-First Engineering
- 1. What Happens When You Press Enter
- 2. Building Your AI Toolkit
- 3. Instructions and Rule System