Skip to main content

Agent Capabilities

Diminuendo’s coding agents are not stateless prompt-response loops. They are persistent, stateful processes with structured planning, interactive question-answer flows, cross-session memory extraction, and deep project configuration awareness. These capabilities are shared across all TypeScript agents — the Sonnet Agent (agents/sonnet-agent/) and the Coding Agent TS (agents/coding-agent-ts/) — and are orchestrated through the gateway’s protocol and event system. This page documents the four capability domains that elevate these agents from simple tool-calling loops to context-aware engineering partners.

Project Configuration: AGENTS.md and CLAUDE.md

Every coding session begins with context injection. Before the first LLM call, the agent discovers and loads project instruction files following a strict priority order:
1

User-Level Instructions

~/.claude/CLAUDE.md — global instructions that apply to all projects for this user.~/.claude/AGENTS.md — global agent-specific instructions.
2

Project-Level Instructions

Checked into the codebase and shared across the team:
  • CLAUDE.md or .claude/CLAUDE.md
  • AGENTS.md, .claude/AGENTS.md, or .github/AGENTS.md
3

Private Local Instructions

CLAUDE.local.md — user-private overrides, not committed to version control.
4

Rules Directory

.claude/rules/*.md — all markdown files, sorted alphabetically.
The discovered content is assembled into a <system-reminder> block that wraps the project instructions, git status, and current date. This block is injected into the first user message of each session — not the system prompt — which means it participates in the conversation context and can be referenced by the model in subsequent turns. The _getAnnotation() method tags each file with its provenance: "project instructions, checked into the codebase", "user's private project instructions, not checked in", or "user's private global instructions for all projects". This gives the model explicit signal about which instructions are shared versus private, enabling it to reason about whether a preference is personal or team-wide.

Why AGENTS.md?

The AGENTS.md convention extends the established CLAUDE.md pattern with agent-specific guidance. Where CLAUDE.md contains general project context (architecture notes, coding standards, dependency decisions), AGENTS.md can contain instructions specifically targeted at automated coding agents — tool usage preferences, file modification boundaries, testing requirements, or review criteria. The three discovery paths (AGENTS.md, .claude/AGENTS.md, .github/AGENTS.md) align with existing conventions across different project structures.

AskUserQuestion: Structured Interactive Dialogue

The AskUserQuestion tool gives agents the ability to pause execution and present structured multi-choice questions to the user. This is not free-form text output — it is a typed protocol interaction that the client renders as a dedicated UI component with selectable options.

Protocol Mechanics

When the LLM invokes AskUserQuestion, the agent does not execute it through the standard tool registry. Instead, the agent intercepts the tool call at the agentic loop level:
  1. Validation — 1 to 4 questions, each with 2 to 4 options, unique labels, headers capped at 12 characters
  2. Event emissiontool.question_requested with a UUID request_id and the full question payload
  3. Suspension — a deferred Promise is stored in _pendingQuestions keyed by request_id; the tool loop blocks on this promise
  4. Gateway relay — the question_requested event is broadcast to the client via the existing PodiumEventMapper path
  5. Client rendering — the client displays the question dialog with selectable options (an “Other” free-text option is auto-provided)
  6. Answer flow — the user’s response arrives as an answer_question client message, which Podium delivers to the agent’s onAnswer() callback
  7. ResumeonAnswer() matches the request_id, resolves the pending promise, and the tool loop continues with the formatted answer as the tool result
This mechanism reuses the existing question_requested / answer_question protocol path that was already wired for approval flows. The key architectural insight is that the agent’s _approvalResolve pattern generalizes cleanly to a Map<string, resolver> — multiple concurrent questions could theoretically be outstanding, though in practice the sequential tool loop means only one is active at a time.

Schema

{
  questions: Array<{
    question: string      // "Which database should we use?"
    header: string        // "Database" (max 12 chars, rendered as chip)
    options: Array<{
      label: string       // "PostgreSQL (Recommended)"
      description: string // "Battle-tested, rich ecosystem"
    }>
    multiSelect: boolean  // false = single choice, true = checkboxes
  }>
}

Mode Availability

AskUserQuestion is available in explore, plan, and review modes. In implement mode (which has allowedTools: null, meaning all tools), it is available by default. This ensures agents can clarify intent regardless of their current operating mode.

Plan Lifecycle: Structured Task Decomposition

The plan system gives agents the ability to decompose work into discrete, trackable steps — and to communicate progress on those steps in real time. Unlike free-form markdown plans written to files, this is a typed, event-driven lifecycle with gateway-level persistence and client-side rendering.

Architecture

Three components collaborate:

PlanTracker

Agent-side state machine. Holds the active plan with ordered steps, each progressing through pending → in_progress → completed | skipped. Persisted to SonnetAgentState.activePlan for session resume.

Plan Tools

PlanCreate and PlanUpdate — intercepted at the tool loop level (same pattern as AskUserQuestion). The LLM calls them as tools; the agent delegates to PlanTracker and emits events.

Gateway Events

Four event types flow through PodiumEventMapper to the client: plan.created, plan.step_started, plan.step_completed, plan.revised. Created/revised events are persistent (stored to session.db); step progress events are ephemeral.

Event Flow

LLM → PlanCreate tool call
  → Agent intercepts, validates, creates Plan via PlanTracker
  → Emits plan.created { plan_id, title, steps[] }
  → Gateway maps to plan_created, persists, broadcasts
  → Client updates activePlan in chat store
  → plan-progress.tsx renders title + step list + progress bar

LLM → PlanUpdate { step_id: "step-0", status: "in_progress" }
  → Agent intercepts, updates PlanTracker
  → Emits plan.step_started { plan_id, step_id, title }
  → Gateway maps to plan_step_started (ephemeral), broadcasts
  → Client updates step status, progress bar animates

PlanCreate Schema

{
  title: string,
  steps: Array<{
    title: string,
    description?: string  // optional detail, shown during in_progress
  }>
}
Each step receives a sequential ID (step-0, step-1, …) that the LLM references in subsequent PlanUpdate calls. The plan is mutablePlanTracker.revise() replaces the step list while preserving the plan ID, emitting a plan.revised event.

Mode Availability

PlanCreate and PlanUpdate are available in plan mode and implement mode (all tools). This allows agents to create plans during planning phases and track progress during execution.

Cross-Session Memory: LLM-Extracted Persistent Knowledge

The memory system extracts reusable facts from completed coding sessions and makes them available to future sessions. This is not file-based MEMORY.md (which the agent already supports via AutoMemory) — it is a structured, queryable, cross-session knowledge store backed by per-session SQLite databases.

Storage Architecture

data/sessions/{sessionId}/
├── session.db          # Messages, events, turns, usage (hot path)
├── session.db-wal
├── memory.db           # Extracted memories (cold path, created lazily)
└── memory.db-wal
The deliberate separation into a distinct memory.db file is an architectural decision with concrete benefits:
  • Zero write-lock contention — memory extraction writes (which happen at session cleanup) never block the event/message write path
  • Independent WALmemory.db can be checkpointed and replicated independently
  • Lazy creation — the file is only created when the first insert_memory command arrives; short sessions that don’t extract memories incur zero filesystem overhead

Extraction Pipeline

Memory extraction is LLM-powered, following the same Ensemble client pattern used by the agent’s CompactionService:
  1. Trigger — the agent’s cleanup() method fires when a session ends
  2. Guard — sessions with fewer than 4 conversation turns are skipped (insufficient signal)
  3. LLM call — the full conversation history is sent to the Ensemble client with a structured extraction prompt
  4. Parse — the response is parsed as a JSON array of memories, each with content, category, confidence, and tags
  5. Emit — a memory.extracted event carries the memories through the gateway
  6. Persist — the gateway’s event stream handler intercepts memory_extracted, delegates to MemoryService.saveExtracted(), which writes to the session’s memory.db via the writer worker

Memory Schema

CREATE TABLE memories (
  id TEXT PRIMARY KEY,
  session_id TEXT NOT NULL,
  tenant_id TEXT NOT NULL,
  category TEXT NOT NULL DEFAULT 'general',
  content TEXT NOT NULL,
  confidence REAL NOT NULL DEFAULT 1.0,
  tags TEXT DEFAULT '[]',
  created_at INTEGER NOT NULL,
  updated_at INTEGER NOT NULL,
  expires_at INTEGER,
  superseded_by TEXT REFERENCES memories(id)
);
Categories: architecture, preferences, decisions, debugging, conventions, general. Confidence scoring: the extraction LLM assigns a 0.0–1.0 confidence score. Memories below 0.5 confidence receive a 30-day TTL (expires_at); higher-confidence memories are permanent. Expired memories are filtered at read time — no background cleanup process required. Supersession: when a newer memory contradicts an older one, superseded_by creates a soft-delete chain. The active memory index (idx_memories_active) filters on superseded_by IS NULL.

Cross-Session Recall

The MemoryService.recallForSession() method aggregates memories across sessions:
  1. Index lookup — the tenant’s registry database contains a memory_sessions table that tracks which sessions have extracted memories and when
  2. Targeted reads — only sessions with memories are queried (no directory scanning)
  3. Per-session reads — each session’s memory.db is opened read-only via the reader worker
  4. Deduplication — exact content matches are collapsed
  5. Ranking — memories are sorted by confidence × recency and the top N are returned
Clients can trigger recall via the memory_recall client message; the gateway responds with memory_recalled containing the aggregated memories. The memory_forget client message allows users to explicitly remove memories that are no longer relevant.

Gateway Integration

The memory system adds three client message types and three server event types to the protocol:
Client MessagePurpose
memory_recallRequest cross-session memory aggregation
memory_forgetDelete a specific memory
Server EventPersistencePurpose
memory.extractedPersistentN memories extracted at session end
memory.recalledPersistentAggregated memories returned to client
memory.forgottenPersistentConfirmation of memory deletion
The writer and reader workers each maintain a separate LRU cache for memory.db handles (writer: 64 capacity, reader: 32 capacity), independent of the session.db cache. This prevents memory operations from evicting hot session database handles.

Agent Modes and Tool Availability

The four capabilities integrate with the existing mode system. Here is the complete tool availability matrix after these additions:
Toolimplementexploreplanreview
Read, Glob, Grep, Bash
Edit, WriteWrite only
WebFetch, WebSearch
GitHub (read)
GitHub (write)
CodeOutline
AskUserQuestion
PlanCreate
PlanUpdate
Skill
Memory extraction requires no tool — it is an agent lifecycle hook that fires during cleanup(), orthogonal to the mode system entirely.

System Prompt Integration

Each capability adds a dedicated guidance section to the agent’s system prompt:
  • AskUserQuestionaskUserQuestionGuidance() in Sections.ts: when to use the tool, option formatting conventions, the auto-provided “Other” option, and the explicit instruction not to use it for plan approval
  • Plan tools — tool schemas in schemas/index.ts with descriptive parameter documentation; the LLM learns usage from schema descriptions rather than a dedicated prompt section
  • Memory — extraction is implicit (fires at cleanup); recall can be injected into the system-reminder context alongside CLAUDE.md and git status on session start
  • AGENTS.md — discovered and injected automatically via ProjectConfig.buildContextInjection(); no prompt section needed because the content is the prompt
The system prompt now assembles 12 sections in order: identity, system, professional objectivity, doing tasks, executing with care, tool usage policy, skill guidance, AskUserQuestion guidance, tone and style, code references, git guidance, and PR guidance.