Skip to content

AI-First Engineering · Parte 2

Building Your AI Toolkit

Publicado em 17 de mar. de 2026 · 9 min de leitura

ai-first-engineering rules skills mcp hooks
Este conteúdo ainda não está disponível em Português. Ver original →

No Chrome, toque em Mais → Ouvir esta página para ouvir o post

On this page 13 sections
Configurações de leitura
18px
1.6

Building Your AI Toolkit

An agent is a loop around a language model. The model produces text. The loop turns that text into actions — tool calls, file edits, shell commands — and feeds the results back. What are rules, skills, hooks — and when do you use each one? Once that clicks, the rest follows.

↑ Contents

The agent loop

Anthropic describes the agent loop as three phases: gather context, take action, verify results.

Gather context Read files, search code, check state Take action Call tools, edit files, run commands Verify results Run tests, check output, read logs repeat
The agent loop: Gather context → Take action → Verify results → repeat — View diagram →

Here is what those three phases look like in practice, running a pre-PR pipeline on a branch with 24 changed files:

User: "let's wrap this up"

turn 1 (gather)
  git summary → branch · 24 files · 8 backend + 4 test · 2 commits

turn 2 (act — 4 parallel sub-agents)
  Lint        → lint-fix + lint → 0 errors · 0 warnings
  Unit        → lein test → 21 tests · 151 assertions → PASS
  Integration → lein integration → 5 tests · 6 assertions → PASS
  Code review → reads architecture rules + fetches story criteria
              → 2 warnings: missing validation, undocumented field

turn 3 (observe)
  ✅ Lint — clean
  ✅ Unit — 21 pass
  ✅ Integration — 5 pass
  ⚠️ Code review — 2 warnings
  ✅ Story — all criteria met

turn 4 (act → human gate)
  "UserProfile should use strict validation. Here's the fix: ..."
  [ Accept ] [ Dismiss ]

Four turns. The system ran 12 commands (lein is Clojure’s build tool), surfaced two warnings, and proposed a fix. Seven words of input.

Same loop, entirely different task:

User: "prep for my 1:1 with Ana"

turn 1 (gather)
  reads last 14 daily notes
  searches issue tracker for active cards
  reads Ana's person file (role, last topics, working style)

turn 2 (act)
  synthesizes into a relationship-first prep doc:
  - Recent context: what you've been working on
  - Team dynamics: anything relevant to surface
  - Growth angle: what's worth discussing for career development
  - Conversation openers: specific, not generic

turn 3 (act → human gate)
  "Here's your prep. Want me to adjust anything?"

Same loop. Same protocol. No new code.

To understand what’s happening in those traces, two terms to keep separate:

↑ Contents

What an agent actually is

  • Model — the language model. It receives text (context) and produces text (a response). That is all it does.
  • Agent — the model plus the loop around it. The loop feeds context to the model, interprets its response, executes any tool calls, and feeds the results back. The agent is the whole system: model, runtime, and tools.

An agent is a program that runs in a loop. The model receives context and produces a response. The runtime interprets that response. If it contains tool calls, the runtime executes them and feeds the results back as new context. The model produces the next response conditioned on that updated context. This repeats until the response contains no tool calls.

How the loop works: turns

The agent loop is made of turns. A turn is one round trip. In the pre-PR pipeline above: turn 1 produced a tool call for git summary, turn 2 dispatched four sub-agents in parallel, turn 3 synthesized their results, turn 4 presented a fix and paused for human input. Each turn follows the same steps:

  1. The model receives everything accumulated so far: the system prompt (its instructions), the conversation history, and the results of any tools called in the previous turn.
  2. The model produces a response. That response can contain text, one or more tool calls, or both.
  3. If the response contains tool calls, the runtime executes them and appends the results to the context. This starts a new turn at step 1.
  4. If the response contains no tool calls, the loop ends. The text is the final output.

Context System prompt, history, tool results, rules Model Produces response from context Response Text, tool calls, or both Runtime Executes tool calls, collects results if tool calls results become new context
The turn loop: Context flows to the model, the model produces tool calls, the runtime executes them, results feed back as new context — View diagram →

The loop does not care what kind of task it is running. Every turn is the same operation: the model receives context, produces a response, and the runtime executes any tool calls in it. Whether the task is running tests or drafting meeting prep, the mechanism is identical.

↑ Contents

The five building blocks

The agent is the loop.

Instructions and rules — the passive knowledge

Instructions are CLAUDE.md files that load into context automatically. They are not called like tools. They are loaded into the model’s context at session start. Your project CLAUDE.md describes conventions. An architecture doc describes your layering. A coding standards file describes naming conventions.

Claude Code also has a rules system: .claude/rules/ files that can be scoped to specific file paths using glob patterns. Both instructions and rules end up in the same place (the context window), but rules exist because instructions don’t scale. See Instructions and Rule System for the full breakdown.

A short instruction file silently prevents an entire class of mistakes. Put your architecture decisions in a CLAUDE.md once; every session that follows benefits from them without you repeating yourself.

Skills — the workflow layer

Skills are prompt files (SKILL.md) that define how a task should be executed. They are triggered by name (/pre-pr, /meeting-prep, /standup) or by matching user phrases. They orchestrate multi-step workflows, define what information to gather, and can specify where human checkpoints belong.

Instructions are loaded. Skills are invoked. Instructions shape every session passively. Skills activate for specific workflows — they specify what to execute, when to connect to external systems, and where to pause for human input. “When preparing for a 1:1, gather the last two weeks of notes, check the issue tracker for anything blocked, read the person file for context — then draft, don’t deliver.” That’s a skill.

Skills are the orchestration layer above CLI and MCP. This is what makes the pre-PR pipeline possible. The skill tells the agent to dispatch four sub-agents in parallel, each running CLI commands in the background, while the model outputs progress updates and synthesizes results. That parallelism, that async coordination: none of it is possible through MCP, which blocks on every call. Skills run inside the model’s context, which means the model can produce tool calls, spawn sub-agents, and adjust based on results, all within the same turn.

SKILL /pre-pr dispatch Lint CLI lint-fix + lint Unit tests CLI lein test Integration CLI lein integration Code review MCP search + read 4 sub-agents run in parallel — the skill narrates progress and synthesizes results
Skill orchestration: /pre-pr dispatches 4 sub-agents to CLI and MCP tools — View diagram →

Here’s the thing about skills: I didn’t write most of mine by hand. I described the workflow in conversation: “when I say /meeting-prep, gather the last two weeks of notes, check the issue tracker, read the person file.” The agent drafted the skill file. I refined it, tested it, iterated. The feedback loop is minutes: edit a markdown file, reload, try again. No server to build, no schema to validate, no deployment.

CLI — the deterministic anchor

CLI tools are predetermined operations. git commit commits. lein test runs tests. Same input, same output, every time.

The model’s response determines when to call them — but the command does the same thing regardless. This is the key: CLI is not “simple.” It’s deterministic. You use CLI when you need the operation to behave identically every time, regardless of context.

Not limited to code either. Fetch issues from your tracker, generate a standup, check your calendar. If it can run in a shell and return a predictable result, it belongs in CLI.

MCP — the cross-client protocol

MCP (Model Context Protocol) lets AI agents call external tools over a standardized JSON-RPC (Remote Procedure Call) interface. You define a server once; it works in Claude Code, Cursor, VS Code, and any other MCP-compatible client. Anthropic describes it as USB-C for AI.

Each MCP tool is self-describing — it carries its name, description, and parameter schema. These descriptions are loaded into context, and the model produces tool calls with structured parameters based on them.

Claude Code (MCP client) enterprise-search search, read_document (MCP server) internal-api get_customer, create_order (MCP server) JSON-RPC JSON-RPC
MCP architecture: client connects to servers over JSON-RPC — View diagram →

The first tradeoff is context cost. MCP tool responses are often verbose JSON. A single search result can return thousands of tokens of metadata that wastes context. Every token in a tool response competes with your conversation, your code, and your instructions for space in the context window. When using MCP, design your servers to return focused responses, not raw API dumps.

The second tradeoff is blocking. Every MCP tool call blocks the agent until it returns: no progress updates, no intermediate state, no parallelism in standard implementations. A CLI command can run in the background while the agent continues working. A skill can dispatch sub-agents that run concurrently. MCP can’t. The moment a workflow needs duration or parallelism, MCP breaks the experience.

Use MCP when you need cross-client portability, when the AI should discover available capabilities at runtime, or when you’re bridging to external systems with atomic, fast operations. Prefer CLI when you need a fast, predictable result without the context overhead. Prefer skills when you need orchestration, progress, or parallelism — skills orchestrate both CLI and MCP tools, which is the correct mental model: skills as the orchestration layer on top of both.

Hooks — the structural enforcer

Hooks fire at lifecycle points — before a tool runs, after it succeeds, when the session ends. A PreToolUse hook can block operations before they execute. A PostToolUse hook can redirect behavior after a tool call.

Agent Decides to call a tool PreToolUse Annotate · Block · Validate block allow + context pass Blocked Wrong command denied Annotated Context injected Tool still runs Tool call Bash · Write · Read · MCP result PostToolUse Observe · Log · Update UI Side effect Update terminal · Log event
Hooks lifecycle: Agent → PreToolUse (block / annotate / pass) → Tool call → PostToolUse → Side effect — View diagram →

The critical insight: instructions are context the model may not follow. Hooks are structural enforcement.

A CLAUDE.md instruction like “always run tests before committing” competes with everything else in the context window: the conversation, the code, the tool results. The longer the session, the more that instruction has to compete with. A hook doesn’t compete. It runs as code, outside the model’s context, on every tool call. The constraint doesn’t depend on what’s in the model’s context — the hook enforces it regardless.

Start with instructions for guidance. Escalate to hooks when you find an instruction that keeps getting ignored. And once you have hooks, you have an event layer: every tool call becomes an observable event you can log, validate, and react to. Hooks are the foundation for enforcement and observability.

The intelligence-determinism split

The five layers split along one axis:

NON-DETERMINISTIC — shapes thinking Rules Passive knowledge — absorbed not called Skills Workflow brain — invoked orchestrates Agent Plans, decides, calls tools shape direct DETERMINISTIC — executes and enforces Hooks Structural enforcement — intercepts tool calls Tools Bash, Write, Read, MCP, ... calls intercept
The intelligence-determinism split: Rules and Skills shape the Agent, which calls Tools that Hooks intercept — View diagram →

Skills sit above the deterministic layer — they orchestrate CLI and MCP tools, not alongside them. Instructions load passively into the agent’s context, shaping the model’s output when skills are active. Hooks intercept tool calls from below, enforcing constraints regardless of what the skill or model decided.

You don’t want the model’s output to determine whether to run your tests. You want tests to always run. CLI is the right layer. You do want the model’s output to determine how to frame your 1:1 prep based on context. Skills are the right layer.

The split also determines where to debug. If a CLI command returns the wrong result, debug the command. If the model produces an undesirable output, look at the skill or instruction that shaped the context.

↑ Contents

Why this architecture runs on markdown

Three of the five layers are just markdown files:

FileWhat it isWhat it also is
SKILL.mdHuman-readable guideAgent’s workflow instructions
CLAUDE.mdProject READMEAI’s identity file
RULES.mdCoding standardsPassive context loaded every session

This is not accidental. Markdown is the default format for AI agents for the same reason JSON became the default for web APIs: LLMs generate it natively. Their training data is dominated by GitHub READMEs, documentation, and Stack Overflow answers, all markdown. Humans can read and edit it without tooling.

The token economics reinforce it. Per Cloudflare’s data, a markdown heading costs ~3 tokens; the HTML equivalent costs 12-15, a 4-5x overhead. Across an entire instruction file or skill definition loaded into context, the savings compound. When context window space is your most constrained resource, format choices are architectural choices.

Markdown is simultaneously documentation and executable configuration. A SKILL.md is a human-readable guide that also serves as the agent’s workflow instructions. CLAUDE.md works the same way: it reads like a project README, but the model loads it as its identity file every session. No other format serves both constituencies. JSON configs are unreadable. YAML is not documentation.

If your AI system uses a format the model wasn’t trained on, you pay the cost in every token.

↑ Contents

The role shift: from doer to architect

What changes when you think about AI this way: the role shifts from doing the work to designing the workflow.

Three questions become your job:

  1. What information does this workflow need?
  2. What decisions require a human?
  3. Where are the repetitive steps the agent should handle?

I acted as architect, defining what each workflow needed, where the human gates belong, what information to gather. The agent implemented it. CLI, MCP, Skills, Rules, and Hooks are the building blocks. Markdown is the format they share.

Next in this series: Instructions and Rule System — how CLAUDE.md instructions and .claude/rules/ actually work, and why I got the terminology wrong in the original version of this post.

Referenced by

posts
Explore full graph →
  • Modelo

    O componente de modelo de linguagem dentro de um agente de IA que recebe contexto em texto e produz respostas em texto, incluindo tool calls.

  • Tool Call

    Quando o modelo pede pro harness fazer algo — como ler um arquivo, rodar um comando ou consultar um calendário. O modelo descreve a ação em texto; o harness executa.

  • Rules

    Arquivos markdown com escopo de caminho no diretório .claude/rules/ que fornecem instruções modulares e condicionais ao Claude Code baseadas em quais arquivos o modelo está trabalhando.

  • Skills

    Arquivos SKILL.md que definem como uma tarefa específica deve ser executada, invocados por nome em vez de carregados passivamente.

  • Hooks

    Comandos shell definidos pelo usuário que executam em pontos específicos do lifecycle do Claude Code, fornecendo enforcement determinístico fora do contexto do modelo.

  • Contexto

    O input que o modelo recebe num dado turno, incluindo o system prompt, o histórico da conversa e os resultados de tools de turnos anteriores.

  • Runtime

    O componente de um agente de IA que executa tool calls produzidas pelo modelo e gerencia o loop de interação.

  • Turno

    Uma ida e volta dentro de um agent loop onde o modelo produz tool calls, o runtime as executa e os resultados voltam como novas mensagens.

  • Subagent

    Uma instância de agente disparada pelo agente principal para executar uma tarefa isolada com contexto limpo.

  • System Prompt

    O conjunto inicial de instruções injetado antes de qualquer mensagem do usuário numa conversa com LLM que molda o comportamento do modelo.

  • Instructions

    O que a Anthropic chama de arquivos CLAUDE.md: arquivos markdown simples carregados na janela de contexto no início da sessão para dar ao modelo contexto persistente.

  • CLAUDE.md

    O formato de arquivo para instructions no Claude Code, carregado na janela de contexto no início da sessão para dar ao modelo contexto persistente.

  • Claude Code

    Ferramenta oficial de coding agêntico da Anthropic que roda no terminal e opera através de tools, hooks e arquivos de configuração CLAUDE.md.

  • Janela de Contexto

    A quantidade máxima de texto que um modelo de linguagem processa numa única inferência, medida em tokens.

  • Prompt

    Texto fornecido a um modelo de linguagem que condiciona sua resposta, incluindo mensagens do usuário, system prompts e qualquer outro contexto.

  • Workflow

    Uma sequência definida de passos executados para realizar uma tarefa, onde o output de um passo alimenta o input do próximo.

  • Sessão

    Uma interação contínua entre um usuário e um agente de coding com IA, do início ao fim.

  • CLI

    Interface de linha de comando é uma interface baseada em texto para interagir com software através de comandos digitados.

  • MCP

    Model Context Protocol, um padrão aberto de interface JSON-RPC que permite agentes de IA descobrirem e chamarem tools externas por um protocolo unificado.

  • Markdown

    Uma linguagem de marcação leve que usa convenções de texto simples para produzir documentos estruturados, legíveis sem renderização.

  • JSON-RPC

    Um protocolo leve de chamada remota de procedimento codificado em JSON, onde um cliente envia o nome do método e parâmetros e o servidor retorna um resultado.

  • JSON

    JavaScript Object Notation, um formato de texto leve para intercâmbio de dados usando pares chave-valor e arrays que é tanto legível por humanos quanto parseável por máquinas.

  • Tokens

    As unidades atômicas que um modelo de linguagem processa, produzidas ao dividir texto em palavras, subpalavras ou pontuação antes da inferência.

  • Lifecycle

    A sequência ordenada de eventos do início ao fim de um processo, definindo pontos fixos onde comportamento pode ser anexado ou observado.

  • Tool

    Uma ação que o harness sabe executar quando o modelo pede: ler um arquivo, rodar um comando, fazer uma busca.

  • PreToolUse

    Um evento de hook do Claude Code que dispara antes de toda tool call ser executada, capaz de bloquear, anotar ou deixar a chamada passar.

  • PostToolUse

    Um evento de hook do Claude Code que dispara depois que uma tool call é executada com sucesso, recebendo o nome da tool, parâmetros e resultado.

  • LLM

    Large Language Model — um programa que aprendeu padrões a partir de quantidades massivas de texto e usa esses padrões pra produzir texto novo, uma palavra (token) por vez.

  • Human Gate

    Um ponto no agent loop onde o modelo pausa e espera input humano antes de continuar.

  1. 1. What Happens When You Press Enter
  2. 2. Building Your AI Toolkit
  3. 3. Instructions and Rule System