refactor: Move to agent architecture

2026-06-30 01:59:11 -04:00 · 2026-05-30 14:50:49 +02:00
parent 34ce787320
commit ccc2ec2e80
364 changed files with 10801 additions and 19020 deletions
--- a/docs/agent-roles.md
+++ b/docs/agent-roles.md
@@ -1,174 +0,0 @@
-# Agent Roles
-
-Agent Roles allow you to define different AI personas with specialized system prompts for various tasks. Switch between roles instantly in the chat interface to adapt the AI's behavior to your current needs.
-
-## Overview
-
-Agent Roles are reusable system prompt configurations that modify how the AI assistant responds. Instead of manually changing system prompts, you can create roles like "Developer", "Code Reviewer", or "Documentation Writer" and switch between them with a single click.
-
-**Key Features:**
- **Quick Switching**: Change roles from the chat toolbar dropdown
- **Custom Prompts**: Each role has its own specialized system prompt
- **Built-in Roles**: Pre-configured Developer and Code Reviewer roles
- **Persistent**: Roles are saved locally and loaded on startup
- **Extensible**: Create unlimited custom roles for different tasks
-
-## Default Roles
-
-QodeAssist comes with three built-in roles:
-
-### Developer
-Experienced Qt/C++ developer with a structured workflow: analyze the problem, propose a solution, wait for approval, then implement. Best for implementation tasks where you want thoughtful, minimal code changes.
-
-### Code Reviewer
-Expert C++/QML code reviewer specializing in C++20 and Qt6. Checks for bugs, memory leaks, thread safety, Qt patterns, and production readiness. Provides direct, specific feedback with code examples.
-
-### Researcher
-Research-oriented developer who investigates problems and explores solutions. Analyzes problems, presents multiple approaches with trade-offs, and recommends the best option. Does not write implementation code — focuses on helping you make informed decisions.
-
-## Using Agent Roles
-
-### Switching Roles in Chat
-
-1. Open the Chat Assistant (side panel, bottom panel, or popup window)
-2. Locate the **Role selector** dropdown in the top toolbar (next to the configuration selector)
-3. Select a role from the dropdown
-4. The AI will now use the selected role's system prompt
-
-**Note**: Selecting "No Role" uses only the base system prompt without role specialization.
-
-### Viewing Active Role
-
-Click the **Context** button (📋) in the chat toolbar to view:
- Base system prompt
- Current agent role and its system prompt
- Active project rules
-
-## Managing Agent Roles
-
-### Opening the Role Manager
-
-Navigate to: `Qt Creator → Preferences → QodeAssist → Chat Assistant`
-
-Scroll down to the **Agent Roles** section where you can manage all your roles.
-
-### Creating a New Role
-
-1. Click **Add...** button
-2. Fill in the role details:
-   - **Name**: Display name shown in the dropdown (e.g., "Documentation Writer")
-   - **ID**: Unique identifier for the role file (e.g., "doc_writer")
-   - **Description**: Brief explanation of the role's purpose
-   - **System Prompt**: The specialized instructions for this role
-3. Click **OK** to save
-
-### Editing a Role
-
-1. Select a role from the list
-2. Click **Edit...** or double-click the role
-3. Modify the fields as needed
-4. Click **OK** to save changes
-
-**Note**: Built-in roles cannot be edited directly. Duplicate them to create a modifiable copy.
-
-### Duplicating a Role
-
-1. Select a role to duplicate
-2. Click **Duplicate...**
-3. Modify the copy as needed
-4. Click **OK** to save as a new role
-
-### Deleting a Role
-
-1. Select a custom role (built-in roles cannot be deleted)
-2. Click **Delete**
-3. Confirm deletion
-
-## Creating Effective Roles
-
-### System Prompt Tips
-
- **Be specific**: Clearly define the role's expertise and focus areas
- **Set expectations**: Describe the desired response format and style
- **Include guidelines**: Add specific rules or constraints for responses
- **Use structured prompts**: Break down complex roles into bullet points
-
-## Storage Location
-
-Agent roles are stored as JSON files in:
-
-```
-~/.config/QtProject/qtcreator/qodeassist/agent_roles/
-```
-
-**On different platforms:**
- **Linux**: `~/.config/QtProject/qtcreator/qodeassist/agent_roles/`
- **macOS**: `~/Library/Application Support/QtProject/Qt Creator/qodeassist/agent_roles/`
- **Windows**: `%APPDATA%\QtProject\qtcreator\qodeassist\agent_roles\`
-
-### File Format
-
-Each role is stored as a JSON file named `{id}.json`:
-
-```json
-{
-    "id": "doc_writer",
-    "name": "Documentation Writer",
-    "description": "Technical documentation and code comments",
-    "systemPrompt": "You are a technical documentation specialist...",
-    "isBuiltin": false
-}
-```
-
-### Manual Editing
-
-You can:
- Edit JSON files directly in any text editor
- Copy role files between machines
- Share roles with team members
- Version control your roles
- Click **Open Roles Folder...** to quickly access the directory
-
-## How Roles Work
-
-When a role is selected, the final system prompt is composed as:
-
-```
-┌─────────────────────────────────────────────────┐
-│ Final System Prompt = Base Prompt + Role Prompt │
-├─────────────────────────────────────────────────┤
-│ 1. Base System Prompt (from Chat Settings)      │
-│ 2. Agent Role System Prompt                     │
-│ 3. Project Rules (common/ + chat/)              │
-│ 4. Linked Files Context                         │
-└─────────────────────────────────────────────────┘
-```
-
-This allows roles to augment rather than replace your base configuration.
-
-## Best Practices
-
-1. **Keep roles focused**: Each role should have a clear, specific purpose
-2. **Use descriptive names**: Make it easy to identify roles at a glance
-3. **Test your prompts**: Verify roles produce the expected behavior
-4. **Iterate and improve**: Refine prompts based on AI responses
-5. **Share with team**: Export and share useful roles with colleagues
-
-## Troubleshooting
-
-### Role Not Appearing in Dropdown
- Restart Qt Creator after adding roles manually
- Check JSON file format validity
- Verify file is in the correct directory
-
-### Role Behavior Not as Expected
- Review the system prompt for clarity
- Check if base system prompt conflicts with role prompt
- Try a more specific or detailed prompt
-
-## Related Documentation
-
- [Project Rules](project-rules.md) - Project-specific AI behavior customization
- [Chat Assistant Features](../README.md#chat-assistant) - Overview of chat functionality
- [File Context](file-context.md) - Attaching files to chat context
-
--- a/docs/agent-templates-design.md
+++ b/docs/agent-templates-design.md
@@ -0,0 +1,401 @@
+# Agent Templates — Design Note (body model, include, extends)
+
+Status: IMPLEMENTED, then partially superseded. The `[body]` table + `extends`
+model shipped; the **bundled partials described below were removed** — each wire
+base now inlines its message serialization, and bases were split into a
+wire-only abstract base (provider + endpoint + serialization) plus a thin
+concrete agent that carries all policy (model, persona, tags, caching, thinking,
+sampling). `{% include %}` survives only for user-supplied partials. Treat the
+partials sections here as historical record; the current user-facing guide is
+`creating-agents.md`. Dev-facing (not end-user docs).
+Scope: how agent TOML profiles describe the request and share structure.
+
+## Problem this replaces
+
+The shipped model has each agent embed a `[template].message_format` jinja string
+that hand-builds the **whole** request body as text, plus `[template.sampling]` and
+`[template.thinking.*]` blocks merged in by `applySampling`. Pains:
+
+- Massive copy-paste: 9 OpenAI-compatible agents share a byte-identical ~50-line
+  `message_format`; 4 Claude agents share another; `role` + README `context` are
+  identical across 18 files.
+- `[template.sampling]` / `[template.thinking.overrides]` /
+  `[template.thinking.request_block.*]` describe **merge machinery**, not the request
+  body — they don't look like the actual API call. The `overrides` vs `request_block`
+  split is meaningless (both are deep-merged into the request identically).
+- Manual JSON-by-string-concatenation: trailing-comma bookkeeping
+  (`{% if not loop.is_last %},{% endif %}`) everywhere; a missing comma fails
+  silently at runtime (`renderBody` returns nullopt, only a `qWarning`).
+- `include` is hard-disabled, so there is no way to share a sub-fragment.
+
+## Agreed model
+
+### 1. `[body]` is a deep-mergeable table = the request body, 1:1 with the API
+
+Replace the `message_format` string and the `sampling`/`thinking` blocks with a
+single `[body]` TOML table whose keys are the **literal request-body fields**.
+Because it is a table (not a string), `extends` / `deepMerge` can override it
+field-by-field — variants become a 2-line delta instead of a copied body.
+
+Field-value rules at build time (per key in `[body]`, applied recursively):
+- **string containing jinja** (`{{` or `{%`) → render through inja, splice the
+  output as **raw JSON** (array / object / string). Empty render → key omitted.
+- **string without jinja** (e.g. `"high"`) → literal JSON string, as-is.
+- **number / bool / inline-table** → as-is.
+
+So `messages` / `contents` and `system` / `system_instruction` are just **string
+fields holding jinja**; everything else (`max_tokens`, `temperature`, `stream`,
+`thinking`, `output_config`, `generationConfig`, …) is a literal value that reads
+exactly like the curl body.
+
+No runtime toggles: thinking / tools / streaming are **fixed per agent**. A thinking
+agent literally carries the `thinking` fields; a non-thinking variant is a separate
+file. There is no `{% if thinking %}` in the body. `system` uses
+`{% if existsIn(ctx, "system_prompt") %}` only because that is about *presence of
+data*, not a mode toggle. `enable_thinking` / `enable_tools` are **capability hints**
+(used for UI badges and to decide tool-definition injection) — the body is the source
+of truth for what is actually sent, so a thinking agent's body must carry the thinking
+fields regardless of the flag.
+
+Outside the body:
+- `model` — the TOML `model` is the **default**; a per-agent override chosen in
+  QodeAssist settings wins. Overrides are stored in `agent_models.json`
+  (agentName → model) and applied by `AgentFactory` when it builds the agent
+  (`AgentFactory::effectiveModel`/`setModelOverride`); `Session` still seeds the
+  payload `model` from the resolved `cfg.model`. URL-model providers (Google) put a
+  `${MODEL}` placeholder in `endpoint`; `Session` substitutes the resolved model into
+  the endpoint before sending (same substitution style as `${PROJECT_DIR}`/`${CONFIG_DIR}`),
+  so the override drives the URL too.
+- `tools` — injected by the **provider** when `enable_tools` is set (tool
+  definitions are dynamic, from `ToolsManager`; they can't be authored in TOML).
+- `stream` — always on. Literal `"stream": true` in the body for OpenAI / Claude /
+  Mistral / Responses / Ollama; encoded in the `endpoint` URL for Google.
+
+### 2. `include` re-enabled as whitelisted partials
+
+The message-array rendering (the complex, comma-heavy part) lives in
+`sources/agents/partials/*.jinja`, shared via `{% include %}`. The throwing include
+callback is replaced by a sandboxed resolver that:
+- rejects names containing `..`, a leading `/`, or a scheme/drive;
+- resolves only against known roots: bundled `:/agents/partials/` then the user
+  `partials/` dir;
+- parses/caches the partial in the same `inja::Environment`.
+
+A missing/typo'd partial is a **load-time** error.
+
+### 3. `extends` shares config down a hierarchy
+
+`extends` already exists (`resolveExtends` + `deepMerge` + `abstract`/`hidden`); it
+keeps doing what it does, now over the structured `[body]` too. Each API-shape base
+carries the default developer persona inline in `system_prompt` (the Roles
+subsystem was removed 2026-06-12; see below). No shared root base. Between the
+API-shape base and the concrete agents sits one thin abstract base **per provider**
+(provider_instance + endpoint only) — the designated extension point for user
+agents, so a custom agent is `extends` + `name` + `model`:
+
+```
+openai_base (abstract)        → system_prompt + [body]   (API shape)
+  ├─ mistral_base (abstract)  → provider, endpoint       (per-provider)
+  │   ├─ mistral_chat         → name, model
+  │   └─ mistral_reasoning    → name, model + enable_thinking
+  ├─ openrouter_base (abstract) ...
+  └─ openai_chat              → name, model              (own provider = no mid layer)
+anthropic_base (abstract)     → system_prompt + provider/endpoint + [body]
+  └─ claude_sonnet46          → name, model + [body] thinking / output_config
+google_base (abstract)        → system_prompt + provider + [body]
+  └─ gemini_chat              → endpoint (${MODEL}) + [body.generationConfig] thinkingConfig
+```
+
+Bundled agents are read-only: the loader rejects a user file that reuses a bundled
+`name`. Customisation = a user agent under a new name extending a bundled base (or a
+concrete bundled agent); the per-agent model override in settings covers the
+model-only case without any file.
+
+Notes:
+- `[body]` is shared whole when identical (the 8 OpenAI-compatible providers); a
+  variant overrides only the differing field — no duplicated body.
+- Arrays (`tags`) are **replaced** on override, not appended (`deepMerge` recurses
+  objects only). A child that wants base tags + extras restates the full list.
+- Division of labour: **include** shares the message-rendering fragment across
+  unrelated families; **extends** shares config (system_prompt / endpoint / body)
+  down one inheritance chain.
+- With `model` gone, per-model files collapse: agents that previously differed only
+  by `model` become one agent (the client picks the model). A separate file is only
+  needed when the body genuinely differs (effort, no-thinking, …).
+
+### System prompt — a composable template with building blocks
+
+The old `role` (static text) and `context` (jinja) layers collapse into one
+`agent.system` layer in `Session`, rendered through `ContextRenderer`. The agent's
+`system_prompt` field IS that template — the persona is whatever it renders to.
+Building blocks:
+
+- `{{ read_file("...") }}` / `file_exists` / `${PROJECT_DIR}` / `${CONFIG_DIR}` — existing
+  `ContextRenderer` helpers, composable in the same template. Shared persona text
+  lives in plain markdown under the sandboxed roots (e.g.
+  `${CONFIG_DIR}/personas/reviewer.md`) and is pulled in with `read_file`.
+
+So a profile can do `system_prompt = """{{ read_file("${CONFIG_DIR}/personas/reviewer.md") }}"""`,
+or just inline the text. A persona-switch is an agent-switch (thin `extends` variant).
+The former Roles subsystem (`agent_roles/*.json`, `{{ agent_role(id) }}`, the Roles
+settings page, the chat role picker) was removed on 2026-06-12 — the chat bases now
+inline the developer persona text directly. There is NO per-agent settings override —
+the edit point is the profile's `system_prompt`. Code-completion/FIM agents set no
+`system_prompt`.
+
+## Worked examples
+
+OpenAI base:
+```toml
+abstract = true
+system_prompt = """<inline developer persona text>"""
+provider_instance = "OpenAI (Chat Completions)"
+endpoint = "/chat/completions"
+enable_tools = true
+
+[body]
+max_tokens  = 8192
+temperature = 0.7
+stream      = true
+messages    = """
+[ {% include "partials/openai_messages.jinja" %} ]
+"""
+```
+
+Mistral reasoning child (delta only):
+```toml
+extends = "OpenAI Base Chat"
+name    = "Mistral Reasoning Chat"
+provider_instance = "Mistral AI"
+endpoint = "/v1/chat/completions"
+enable_thinking = true
+
+[body]
+reasoning_effort = "medium"
+```
+
+Claude base (literally the curl body):
+```toml
+abstract = true
+system_prompt = """<inline developer persona text>"""
+provider_instance = "Claude"
+endpoint = "/v1/messages"
+enable_thinking = true
+enable_tools = true
+
+[body]
+max_tokens  = 16000
+temperature = 1
+stream      = true
+thinking      = { type = "adaptive", display = "summarized" }
+output_config = { effort = "high" }
+system   = """{% if existsIn(ctx, "system_prompt") %}{{ tojson(ctx.system_prompt) }}{% endif %}"""
+messages = """
+[ {% include "partials/anthropic_messages.jinja" %} ]
+"""
+```
+
+Sonnet child (delta only):
+```toml
+extends = "Anthropic Base Chat"
+name    = "Claude Sonnet"
+
+[body.output_config]
+effort = "medium"
+```
+
+Google base (`${MODEL}` in endpoint; streaming in the URL):
+```toml
+abstract = true
+system_prompt = """<inline developer persona text>"""
+provider_instance = "Google AI"
+endpoint = "/models/${MODEL}:streamGenerateContent?alt=sse"
+enable_thinking = true
+enable_tools = true
+
+[body]
+system_instruction = """{% if existsIn(ctx, "system_prompt") %}{ "parts": [ { "text": {{ tojson(ctx.system_prompt) }} } ] }{% endif %}"""
+contents = """
+[ {% include "partials/google_contents.jinja" %} ]
+"""
+
+[body.generationConfig]
+maxOutputTokens = 16000
+temperature     = 1
+thinkingConfig  = { includeThoughts = true, thinkingBudget = 8192 }
+```
+
+### Partials
+
+`partials/openai_messages.jinja` dispatches per message:
+```jinja
+{% if existsIn(ctx, "system_prompt") %}
+{ "role": "system", "content": {{ tojson(ctx.system_prompt) }} },
+{% endif %}
+{% for msg in ctx.history %}
+  {% if msg.role == "assistant" %}{% include "partials/openai_assistant.jinja" %}
+  {% else if length(filter_by_type(msg.content_blocks, "tool_result")) > 0 %}{% include "partials/openai_tool_results.jinja" %}
+  {% else %}{% include "partials/openai_user.jinja" %}
+  {% endif %}
+{% endfor %}
+```
+
+`partials/openai_assistant.jinja`:
+```jinja
+{% set tcalls = filter_by_type(msg.content_blocks, "tool_use") %}
+{
+  "role": "assistant",
+  "content": {{ tojson(msg.content) }}
+  {% if length(tcalls) > 0 %}
+  , "tool_calls": [
+    {% for b in tcalls %}
+    { "id": {{ tojson(b.id) }}, "type": "function",
+      "function": { "name": {{ tojson(b.name) }}, "arguments": {{ tojson(tojson(b.input)) }} } },
+    {% endfor %}
+  ]
+  {% endif %}
+},
+```
+
+`partials/openai_tool_results.jinja`:
+```jinja
+{% for b in filter_by_type(msg.content_blocks, "tool_result") %}
+{ "role": "tool", "tool_call_id": {{ tojson(b.tool_use_id) }}, "content": {{ tojson(b.content) }} },
+{% endfor %}
+```
+
+`partials/openai_user.jinja`:
+```jinja
+{% if existsIn(msg, "images") %}
+{ "role": "user", "content": {% include "partials/openai_image_content.jinja" %} },
+{% else %}
+{ "role": "user", "content": {{ tojson(msg.content) }} },
+{% endif %}
+```
+
+`partials/openai_image_content.jinja`:
+```jinja
+[
+  { "type": "text", "text": {{ tojson(msg.content) }} }
+  {% for img in msg.images %}
+  ,
+  {% if img.is_url %}
+  { "type": "image_url", "image_url": { "url": {{ tojson(img.data) }} } }
+  {% else %}
+  { "type": "image_url", "image_url": { "url": "data:{{ img.media_type }};base64,{{ img.data }}" } }
+  {% endif %}
+  {% endfor %}
+]
+```
+
+`partials/anthropic_messages.jinja`:
+```jinja
+{% for msg in ctx.history %}
+{
+  "role": {{ tojson(msg.role) }},
+  "content": [
+    {% for b in msg.content_blocks %}
+      {% if b.type == "image" %}{% include "partials/anthropic_image.jinja" %}
+      {% else %}{{ tojson(b) }},
+      {% endif %}
+    {% endfor %}
+  ]
+},
+{% endfor %}
+```
+
+`partials/anthropic_image.jinja`:
+```jinja
+{
+  "type": "image",
+  "source":
+  {% if b.is_url %}
+  { "type": "url", "url": {{ tojson(b.data) }} }
+  {% else %}
+  { "type": "base64", "media_type": {{ tojson(b.media_type) }}, "data": {{ tojson(b.data) }} }
+  {% endif %}
+},
+```
+
+`partials/google_contents.jinja`:
+```jinja
+{% for msg in ctx.history %}
+{
+  "role": {% if msg.role == "assistant" %}"model"{% else %}"user"{% endif %},
+  "parts": [ {% for b in msg.content_blocks %}{% include "partials/google_part.jinja" %}{% endfor %} ]
+},
+{% endfor %}
+```
+
+`partials/google_part.jinja`:
+```jinja
+{% if b.type == "text" %}
+{ "text": {{ tojson(b.text) }} },
+{% else if b.type == "thinking" %}
+{ "text": {{ tojson(b.thinking) }}, "thought": true, "thoughtSignature": {{ tojson(b.signature) }} },
+{% else if b.type == "tool_use" %}
+{ "functionCall": { "name": {{ tojson(b.name) }}, "args": {{ tojson(b.input) }} } },
+{% else if b.type == "tool_result" %}
+{ "functionResponse": { "name": {{ tojson(b.name) }}, "response": { "result": {{ tojson(b.content) }} } } },
+{% else if b.type == "image" %}
+  {% if b.is_url %}
+  { "file_data": { "mime_type": {{ tojson(b.media_type) }}, "file_uri": {{ tojson(b.data) }} } },
+  {% else %}
+  { "inline_data": { "mime_type": {{ tojson(b.media_type) }}, "data": {{ tojson(b.data) }} } },
+  {% endif %}
+{% else %}
+{ "text": "" },
+{% endif %}
+```
+
+## C++ work
+
+In `JsonPromptTemplate`:
+- Parse `[body]` as a `QJsonObject` (not a string). Walk it recursively and build the
+  request: render jinja-bearing string values via inja and splice the parsed JSON;
+  pass literal strings / scalars / inline-tables through; drop keys whose render is
+  empty.
+- **Delete** `m_sampling`, `m_thinking`, and `applySampling` entirely — the body is
+  the request; there is no separate sampling/thinking merge.
+- Drop the `thinkingEnabled` parameter from `buildFullRequest` /
+  `Provider::prepareRequest` / `Session` — it no longer affects rendering.
+- Add a **JSON-aware** trailing-comma stripper before `QJsonDocument::fromJson`
+  (tracks string/escape state so `,}` / `,]` inside string values are not touched).
+  This is what lets partials emit an unconditional `,` after every element and drop
+  all `loop.is_last` bookkeeping.
+
+In `AgentConfig` / `AgentLoader`:
+- Replace `messageFormat` (string) with `body` (`QJsonObject`); merge `role` +
+  `context` into `system_prompt`. `[template].sampling` / `[template].thinking` are
+  removed.
+- `extends` / `deepMerge` are unchanged; they now also merge `[body]`.
+- Validate at load: a referenced partial must resolve; the assembled body must parse
+  as JSON (render once against a synthetic context with tool_use / tool_result /
+  image). Catches breakage at startup, not mid-conversation.
+
+Model selection (per-agent override):
+- `AgentFactory` owns an agentName → model map loaded from `agent_models.json`
+  (`loadModelOverrides`/`saveModelOverrides`). `create()`/`createFromFile()` apply the
+  override into the built `AgentConfig`; `effectiveModel()` exposes the resolved value;
+  `setModelOverride()` persists. The settings UI (`AgentDetailPane`) edits it via an
+  editable Model field; list/roster widgets display `effectiveModel`.
+- `Session` substitutes `${MODEL}` in `cfg.endpoint` with the resolved model before
+  `sendRequest` (covers Google, whose model lives in the URL), and still seeds the
+  payload `model` from `cfg.model`. The provider keeps injecting `tools` when
+  `enable_tools` is set.
+
+In `Session`:
+- Collapse the `agent.role` + `agent.context` system-prompt layers into one rendered
+  `system_prompt` layer.
+
+## Implementation order
+
+1. JSON-aware trailing-comma stripper + whitelisted `include` resolver (enables
+   readable partials).
+2. `[body]`-table model in `JsonPromptTemplate` + loader; delete
+   sampling/thinking/`applySampling`; drop `thinkingEnabled`.
+3. `system_prompt` merge in loader + `Session`.
+4. Per-agent model override in `AgentFactory` (`agent_models.json`) + `${MODEL}`
+   endpoint substitution in `Session`; editable Model field in settings; convert
+   bundled agents to the base/partials/`extends` layout.
+5. Load-time validation (partial resolves, body parses).
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -0,0 +1,321 @@
+# QodeAssist Architecture
+
+This document describes the **current** runtime architecture, after the §10
+rework in `target-architecture.md` was completed. Every runtime LLM path —
+code completion, chat (send/stream + compression + token counting), and quick
+refactor — flows through one stack: agents, `Session`, and the
+`Providers::GenericProvider` layer. There is no legacy parallel path; the old
+"Stack A" (root `providers/*`, `pluginllmcore/*`, `ConfigurationManager`, the
+provider/model/template settings pages) has been removed.
+
+For the design rationale, layering contract, and cross-cutting policies, see
+[`target-architecture.md`](target-architecture.md). This file documents how the
+code is wired today.
+
+---
+
+## 1. Top level: ownership and dependency injection
+
+The plugin (`qodeassist.cpp`) owns everything via `new` + parent — no
+plugin-wide singletons; each feature receives its dependencies explicitly.
+
+```
+QodeAssistPlugin
+    • Providers::registerBuiltinProviders()   — client_api → provider table
+    • ProviderInstanceFactory                 — provider instances from TOML
+    • ProviderSecretsStore                    — secrets behind a port
+    • AgentFactory                            — agents from TOML + agent_models.json
+    • SessionManager(agentFactory)            — owns the ToolContributorRegistry
+        toolContributors().add(registerQodeAssistTools)
+        toolContributors().add(registerSkillTool)
+        toolContributors().add(McpClientsManager::registerToolsOn)
+    • m_engine (QQmlEngine)
+        rootContext: "agentFactory", "sessionManager"   — DI for chat (QML)
+
+  Wired into consumers:
+    • QodeAssistClient ← LLMClientInterface(generalSettings, completeSettings,
+                            agentFactory, sessionManager, documentReader,
+                            performanceLogger)
+                       ← setSessionManager / setAgentFactory   (quick refactor)
+```
+
+Chat lives in QML (`ChatRootView` is a `QML_ELEMENT`), so `AgentFactory` and
+`SessionManager` are exposed as **context properties on the engine's root
+context** and resolved in `ChatRootView` via
+`qmlEngine(this)->rootContext()->contextProperty(...)`.
+
+---
+
+## 2. Core (agent / Session)
+
+```
+AgentFactory.create(name)
+  configByName(name) → AgentConfig (TOML, [body] table; model override from
+                       agent_models.json applied here)
+  buildProviderForAgent:
+     instance = ProviderInstanceFactory.instanceByName(cfg.providerInstance)
+     provider = ProviderFactory::create(instance.clientApi)
+     provider.setUrl(instance.url)
+     provider.setApiKey(secrets.read(instance.apiKeyRef))
+  ▼
+Agent(config, provider)
+  promptTemplate = JsonPromptTemplate::fromConfig(cfg)   — compiles [body] (inja),
+                   validated at load against a synthetic context
+  provider.setPromptCaching(cfg.cachePrompt, cfg.cacheTtl == "1h")
+  ▼
+SessionManager — two ways to obtain a Session:
+  • createSession(agentName, externalHistory?)  — chat: attaches a persistent,
+                                                  externally-owned history
+  • acquire(agentName) / release(session)       — one-shot pipelines: a small
+                                                  per-agent pool of internal-history
+                                                  sessions; acquire hands out a
+                                                  session with cleared history,
+                                                  cleared system-prompt layers and
+                                                  cleared client tools
+  ▼
+Session(agent[, externalHistory])
+  ├─ ConversationHistory     — messages as polymorphic ContentBlocks
+  ├─ SystemPromptBuilder     — ordered named layers (priority-sorted)
+  └─ ResponseRouter(client)  — adapts client signals → typed ResponseEvent
+
+Session API:
+  • send(blocks)                  — the ONLY dispatch entry point: append a user
+                                    message and dispatch. Completion/chat/refactor
+                                    differ only in block content + template; tools
+                                    on/off comes from the agent's enable_tools.
+  • cancel()                      — tears down in-flight; emits cancelled(id)
+  • history() / systemPrompt() / client()
+  • setContentLoader(loader)      — resolves Stored* attachment/image blocks
+  • lastError() → ErrorInfo       — typed synchronous start-failure detail
+
+Session signals (three-state, mutually exclusive per request):
+  • finished(id, stopReason)
+  • failed(id, ErrorInfo{category, message, providerDetail})
+  • cancelled(id)
+  + event(ResponseEvent)          — live delta stream for the chat UI
+```
+
+`Session::dispatch` renders the agent's `system_prompt` into the `agent.system`
+layer, composes all `SystemPromptBuilder` layers into the request system prompt,
+and substitutes `${MODEL}` in the endpoint before sending.
+
+---
+
+## 3. Provider layer
+
+One configuration-driven `GenericProvider` covers every API; it varies only by
+the LLMQore client factory and metadata. Request *shape* belongs to the agent's
+`JsonPromptTemplate` (the `[body]` table), never to the provider.
+
+```
+ProviderFactory  (sources/providers, namespace functions)
+   registerType(name, fn) / create(name, parent) / knownNames()
+        ▲  registerBuiltinProviders()   — client_api → provider table
+GenericProvider : Providers::Provider
+   • owns an LLMQore::BaseClient (created by a ClientFactory)
+   • prepareRequest → PromptTemplate::buildFullRequest; injects tools when
+     enable_tools; applies ClaudeCacheControl when prompt caching is on
+   • client() / providerID() / getInstalledModels()
+```
+
+### client_api → provider table
+
+| client_api                   | LLMQore client        | ProviderID       |
+|------------------------------|-----------------------|------------------|
+| Claude                       | ClaudeClient          | Claude           |
+| Google AI                    | GoogleAIClient        | GoogleAI         |
+| llama.cpp                    | LlamaCppClient        | LlamaCpp         |
+| Mistral AI                   | MistralClient         | MistralAI        |
+| Codestral                    | MistralClient         | MistralAI        |
+| Ollama (Native)              | OllamaClient          | Ollama           |
+| Ollama (OpenAI-compatible)   | OpenAIClient          | OpenAICompatible |
+| OpenAI (Chat Completions)    | OpenAIClient          | OpenAI           |
+| OpenAI (Responses API)       | OpenAIResponsesClient | OpenAIResponses  |
+| OpenAI Compatible            | OpenAIClient          | OpenAICompatible |
+| OpenRouter                   | OpenAIClient          | OpenRouter       |
+| LM Studio (Chat Completions) | OpenAIClient          | LMStudio         |
+| LM Studio (Responses API)    | OpenAIResponsesClient | OpenAIResponses  |
+
+---
+
+## 4. Configuration model
+
+```
+~/.config/.../qodeassist/config/
+  providers/*.toml   → ProviderInstance { name, client_api, url, api_key_ref }
+  agents/*.toml      → AgentConfig { schema_version, providerInstance, model,
+                                     endpoint, system_prompt, [body], match,
+                                     enable_tools, enable_thinking, cache_prompt,
+                                     extends, abstract, hidden, tags }
+  agent_models.json  → per-agent model override (applied by AgentFactory)
+  pipelines          → codeCompletion (ordered roster, routed by AgentRouter.pickAgent
+                       on {filePath, projectName}); chatAssistant (allow-list for the
+                       chat picker); chatCompression / quickRefactor (single agent each)
+
+Editor policy (NOT agent config):
+  CodeCompletionSettings — triggers, modelOutputHandler, context extraction,
+                           useOpenFilesContext
+```
+
+`[body]` **is** the request body (deep-mergeable through `extends`; Jinja-bearing
+string values render and splice as raw JSON, literals pass through, empty renders
+drop the key). `include` resolves only sandboxed partial roots. Profiles validate
+at load: a referenced partial must resolve and the assembled body must parse as
+JSON against a synthetic context — config errors surface in the agents settings
+page, never as a silent runtime drop. The loader also lints: unknown top-level /
+`[match]` keys and same-layer duplicate names are warnings; a user file that
+reuses a bundled agent's name is rejected (bundled agents cannot be replaced —
+users extend them, or the per-provider abstract bases, under a new name);
+`abstract` and `hidden` are never inherited through `extends`. Full spec:
+[`agent-templates-design.md`](agent-templates-design.md); user-facing guide:
+[`creating-agents.md`](creating-agents.md).
+
+---
+
+## 5. Runtime paths
+
+Agent selection depends on the pipeline. Code completion is the only
+context-routed one: `AgentRouter.pickAgent(roster.codeCompletion, {file,
+project})` walks the ordered roster and returns the first agent whose `[match]`
+fits. Chat filters to the `chatAssistant` allow-list and the user picks; quick
+refactor and compression each use a single configured agent.
+
+### 5a. Code completion
+
+```
+Qt Creator LSP (getCompletionsCycling)
+  ▼
+LLMClientInterface
+  agent   = AgentRouter.pickAgent(roster.codeCompletion, {file, project})
+  session = sessionManager.acquire(agent)                 — pooled
+  systemPrompt layer "completion.context" = fileContext + open-files context
+  session.send( blocks{ CompletionContent(prefix, suffix) } )
+     ▼ on Session::finished:
+  history().lastAssistantText() → CodeHandler (output-mode) → LSP items
+     → sessionManager.release(session)
+```
+
+The completion context travels as a `CompletionContent` block; the template
+exposes it as `ctx.prefix` / `ctx.suffix`. FIM vs instruct is purely agent
+config (the body), not feature code. Completion never touches the delta stream —
+it waits for `finished` and reads the last message.
+
+### 5b. Chat
+
+`ChatRootView` owns one persistent `ConversationHistory` for the whole chat view
+and injects it into every collaborator. **History is the single source of truth.**
+
+```
+ChatRootView (QML)  — owns ConversationHistory m_history
+  ChatModel.setHistory(m_history)          — ChatModel is a PROJECTION:
+        subscribes to messageAdded/Updated/cleared/reset, flattens blocks→rows,
+        overlays file-edit status from ChangesManager, holds a per-message usage map
+  ChatAgentController                       — picker filtered to the
+        chatAssistant allow-list; active agent persisted
+  ▼ dispatchSend
+ClientInterface
+  session = sessionManager.createSession(activeAgent, m_history)
+  sessionManager.toolContributors().contribute(client.tools())   — builtin+skills+MCP
+  session.setContentLoader(ChatSerializer::loadContentFromStorage)
+  systemPrompt layer "chat.context" = project info + skills + linked files
+  session.send( blocks{ TextContent + StoredAttachmentContent + StoredImageContent } )
+     ▼ consumes Session signals (NOT raw client signals):
+  event(Usage)        → ChatModel.setMessageUsage + token-counter calibration
+  finished(id)        → ChangesManager.applyPendingEditsForRequest + persist;
+                        removeSession (the persistent history survives)
+  failed(id, ErrorInfo) → surface error; removeSession
+
+ChatCompressor    → acquire(chatCompression agent — single configured) → seed history
+                    from the chat's messages → "compression" layer → send → read summary
+                    from the compression session's own history → release
+InputTokenCounter → estimates over ConversationHistory (calibrated by Usage events)
+ChatSerializer    → persists ConversationHistory via MessageSerializer (v0.3);
+                    imports legacy v0.1/v0.2 files
+```
+
+`ChatModel`'s QML role surface (roleType / content / attachments / images /
+isRedacted / token roles) is unchanged, so the QML delegates were untouched. The
+projection's incremental updates avoid model resets on the streaming hot path.
+
+### 5c. Quick refactor
+
+```
+QodeAssistClient.requestQuickRefactor → QuickRefactorHandler
+  agent   = pipelines.quickRefactor (single configured agent)
+  session = sessionManager.acquire(agent)
+  if useTools: sessionManager.toolContributors().contribute(client.tools())
+  systemPrompt layer "refactor" = tagged selection + output + indentation rules
+  session.send(blocks{instructions})
+     ▼ on Session::finished:
+  history().lastAssistantText() → ResponseCleaner → RefactorResult → editor insert
+     → sessionManager.release(session)
+  on Session::failed(ErrorInfo) → RefactorResult{error}
+```
+
+---
+
+## 6. Context layer
+
+The context services sit behind IDE-agnostic ports; Qt Creator API use lives in
+the adapters.
+
+```
+EditorContext   — IDocumentReader (port)  ← DocumentReaderQtCreator (TextEditor API)
+ProjectContext  — IProjectScanner (port)  ← ProjectScannerQtCreator (ProjectExplorer
+                  + Core::DocumentModel + the IgnoreManager for .qodeassistignore)
+TokenEstimator  — TokenUtils (pure)       ← InputTokenCounter (thin UI consumer)
+```
+
+`ContextManager` is now Qt-Creator-free: it delegates open-file enumeration and
+ignore filtering to an injected `IProjectScanner` (defaulting to the QtC adapter),
+and keeps only filesystem reads + formatting. `ContextManager::shouldIgnore(path)`
+replaced the previously exposed `ignoreManager()`.
+
+---
+
+## 7. Cross-cutting
+
+- **Request lifecycle** — a session has at most one in-flight request; `send()`
+  while in flight cancels the previous. Every request ends in exactly one of
+  `finished` / `failed` / `cancelled`. Cancellation is not an error; no consumer
+  string-matches a message to tell them apart.
+- **Typed errors** — `ErrorInfo { category ∈ {Config, Auth, Network, Provider,
+  Validation, Tool}, message, providerDetail }`. `ResponseRouter` categorizes wire
+  errors (best-effort) at the boundary; `Session::failed` carries the typed value.
+- **Tools** — `SessionManager` owns a `ToolContributorRegistry`; built-in ToolKit,
+  the skill tool, and MCP client tools register once and are contributed to chat
+  and quick-refactor session clients uniformly.
+- **Threading** — the core runs on the GUI thread; concurrency is the Qt event
+  loop plus async network I/O. Blocking work hides behind L3 ports.
+
+---
+
+## 8. Tests
+
+`test/` (GTest + Qt::Test) covers the two engines most affected by the rework:
+
+- `JsonPromptTemplateTest` — the `[body]` engine: jinja render + JSON splice,
+  literal passthrough, empty-render key drop, nested literals, and load-time
+  rejection of bodies that render invalid JSON.
+- `ResponseRouterTest` — a fake `BaseClient` replays a recorded provider stream;
+  asserts the assistant message is stamped with the request id, history is built
+  correctly (thinking + text + tool use/result), the typed event stream is emitted,
+  and wire errors are categorized.
+- `BundledAgentsTest` — loads every bundled agent through the real loader (extends
+  + partials resolved from the qrc) and renders each `[body]` against the synthetic
+  validation context. This is the load-time validation guarantee run in CI: a broken
+  bundled body, partial, or `extends` chain fails the test instead of surfacing as a
+  silent runtime drop.
+
+---
+
+## 9. Remaining follow-ups (optional)
+
+1. **Qt-Creator-free core build + CI** — `AgentFactory` / `ContextRenderer` still
+   call `Core::ICore::userResourcePath`, so the core targets link `QtCreator::Core`.
+   A `ResourcePaths` port + adapter would let the core build without Qt Creator and
+   enable a CI job that fails on a layering-violating include. (The bundled-agent
+   render check already runs in the QtC-linked test binary — see §8.)
+2. **§9 target module layout** — the `core/ ide/ features/ hosts/` physical target
+   split in `target-architecture.md` is not yet reflected in the directory layout.
+```
--- a/docs/chat-summarization.md
+++ b/docs/chat-summarization.md
@@ -110,6 +110,4 @@ No additional configuration is required.

 ## Related Documentation

- [Agent Roles](agent-roles.md) - Switch between AI personas
 - [File Context](file-context.md) - Attach files to chat
- [Project Rules](project-rules.md) - Customize AI behavior
--- a/docs/context-architecture.md
+++ b/docs/context-architecture.md
@@ -0,0 +1,347 @@
+# QodeAssist — Context Architecture (v1.0)
+
+Status: design proposal, extends `target-architecture.md` (§7 ContextEngine,
+delta #9) and `agent-templates-design.md` (the `ctx.*` template contract).
+Scope: everything between "facts exist in the IDE / on disk / in the
+conversation" and "bytes leave in the request body" — what context each
+pipeline needs, who acquires it, where it lands in the prompt. One assembly
+runs per `send()`; tool continuations stay inside LLMQore (§4.3).
+
+---
+
+## 1. Taxonomy — the five kinds of context
+
+Every piece of context the model ever sees falls into one of five categories.
+The categories differ in *acquisition mode*, *volatility*, and therefore
+*placement* — conflating them is the root cause of today's problems (§3).
+
+| # | Category | What it answers | Examples | Volatility |
+|---|----------|-----------------|----------|------------|
+| C1 | **Identity** | who is the assistant | agent `system_prompt` (persona inline or via `read_file()`), always-on skills, skills catalog | per agent change |
+| C2 | **Environment** | where is it working | project name + source root, build dir, language/file info, recent changes | per project / slow |
+| C3 | **Task** | what is asked *now* | chat message, attachments, images, invoked-skill body, completion prefix/suffix, refactor selection + instruction | every turn |
+| C4 | **Conversation** | what happened so far | history (text, thinking, tool use/results), compression summary | grows every turn |
+| C5 | **Pulled** | what the model asked for | tool results (read file, search, build, diagnostics), MCP tool results | inside the turn |
+
+Two acquisition modes cut across the categories:
+
+- **Push** — we inject proactively (C1–C3, C4). Push is a *per-pipeline
+  policy*: completion must push everything (no latency budget for tools);
+  chat should push little and let the model pull.
+- **Pull** — the model requests through tools (C5). Pull needs no assembly
+  policy at all, but its *results* become C4 and therefore must flow through
+  the same budget and serialization rules as everything else.
+
+One more orthogonal property drives placement: **stability**. Provider prompt
+caches (Claude `cache_control`) reward byte-stable prefixes. Stable content
+belongs early (system), volatile content belongs late (near the last user
+message). This single rule decides almost every placement question below.
+
+---
+
+## 2. Context inventory per pipeline
+
+What each use case (numbering from `target-architecture.md` §1) actually
+needs, against the taxonomy:
+
+| Context item | Cat | U1 completion | U2 chat | U3 refactor | compression | Source port |
+|---|---|---|---|---|---|---|
+| agent `system_prompt` (persona) | C1 | ✓ | ✓ (persona switch = agent switch) | ✓ | ✓ | AgentProfile + ContextRenderer |
+| skills catalog + always-on | C1 | — | ✓ | — | — | SkillsEngine |
+| project root / build dir | C2 | — | ✓ | — | — | `IProjectScanner` |
+| language + file info | C2 | ✓ | — | ✓ | — | `IDocumentReader` |
+| recent project changes | C2 | optional (setting) | — | optional | — | ChangesManager |
+| prefix / suffix (FIM) | C3 | ✓ | — | — | — | `IDocumentReader` |
+| selection + position markers | C3 | — | — | ✓ | — | `IDocumentReader` |
+| user message text | C3 | — | ✓ | ✓ (instruction) | ✓ (directive) | UI |
+| attachments / images | C3 | — | ✓ | — | — | chat storage (loader) |
+| invoked skill body (`/cmd`) | C3 | — | ✓ | — | — | SkillsEngine |
+| linked files (pinned) | C3/C2 | — | ✓ | — | — | `IProjectScanner` + fs |
+| open-files sync | C3/C2 | — | ✓ | — | — | `IProjectScanner` |
+| history | C4 | — (fresh session) | ✓ | — (fresh) | ✓ (read-only input) | ConversationHistory |
+| tool results | C5 | — | ✓ | ✓ (optional) | — | ToolsManager / McpHub |
+
+---
+
+## 3. Problems in the current code this design removes
+
+1. ~~**Two assembly paths.**~~ — RECLASSIFIED 2026-06-12 as by-design, not a
+   problem: the first request renders from `ConversationHistory`; tool
+   continuations are LLMQore's replay of that payload plus appended tool
+   results. The replay carries the full filtered history of its base payload,
+   so the feared filter divergence does not materialize in practice (§4.3).
+2. **No budget.** History is never trimmed, estimated, or compacted; every
+   send ships everything, forever.
+3. **Volatile content in system.** Linked-file contents live in the
+   `chat.context` system layer; any file edit between turns invalidates the
+   provider prompt cache for the whole request.
+4. **Invoked skills evaporate.** A `/skill` body is injected into the system
+   layer for one send only — the next turn the model has lost the skill's
+   instructions, although the conversation continues to rely on them.
+5. **Silent loss.** A failed attachment load drops the block with no trace —
+   neither the model nor the user learns the image is gone.
+6. **Repeated materialization.** Every send re-reads and re-base64s every
+   stored image/attachment of the whole history from disk.
+7. **Placement decided ad hoc.** Each feature hand-formats markdown and picks
+   a system layer by habit (`completion.context`, `refactor`, `chat.context`);
+   there is no shared rule for what goes where, and the project-info block is
+   formatted three different ways.
+
+---
+
+## 4. Architecture — Acquire → Assemble → Shape
+
+Three stages with hard ownership boundaries:
+
+```mermaid
+flowchart LR
+    subgraph L3["Acquire — ContextEngine (L3, ports + QtC adapters)"]
+        EC["EditorContext<br/>prefix/suffix, selection,<br/>language, copyright strip"]
+        PC["ProjectContext<br/>root, ignore filter,<br/>open files, changes"]
+        TE["TokenEstimator<br/>calibrated by Usage"]
+    end
+    subgraph L4["Features (L4) — decide WHAT"]
+        F["chat / completion / refactor<br/>set layers, pin providers,<br/>build user blocks"]
+    end
+    subgraph L2["Assemble — Session (L2) — decide WHERE & HOW MUCH"]
+        SPB["SystemPromptBuilder<br/>stable layers only"]
+        PIN["Pinned providers<br/>re-materialized every dispatch"]
+        CA["ContextAssembler<br/>history + layers + pinned<br/>+ loader + budget → ctx"]
+    end
+    subgraph L1["Shape — JsonPromptTemplate (L1/L2)"]
+        TPL["[body] jinja over ctx.*"]
+    end
+    EC --> F
+    PC --> F
+    F --> SPB
+    F --> PIN
+    F --> CA
+    SPB --> CA
+    PIN --> CA
+    TE --> CA
+    CA --> TPL
+```
+
+- **Acquire (L3)** — `ContextEngine` services behind IDE-agnostic ports read
+  facts from the IDE/fs. No prompt text, no placement decisions. One shared
+  `EnvBlockFormatter` renders the project/file info block so it is identical
+  in every pipeline.
+- **Features (L4)** decide *what* context a turn needs: they set their system
+  layer, pin refreshable providers, and compose user blocks. They never
+  decide request shape and never concatenate history.
+- **Assemble (L2)** — `ContextAssembler` (successor of
+  `Session::toLegacyContext`) is the **only** producer of the template
+  context, once per `send()` dispatch; tool continuations replay that payload
+  inside LLMQore (§4.3). It owns placement policy, budget enforcement,
+  materialization, and the manifest.
+- **Shape (L1)** — the agent's `[body]` table renders `ctx.*` into the wire
+  request. Templates own *shape per provider*, never content.
+
+### 4.1 The three injection mechanisms
+
+| Mechanism | For | Lifetime | Refresh | Persisted |
+|---|---|---|---|---|
+| **System layers** (`SystemPromptBuilder`) | stable C1/C2: `agent.system`, `env.project`, `skills.catalog`, `refactor`, `compression` | conversation | on send | no |
+| **Pinned providers** (new) | refreshable C3/C2: linked files, open-files sync | until unpinned | **every `send()`** | as reference only |
+| **User blocks** (`send(blocks)`) | one-shot C3: message, attachments, images, invoked-skill body, completion content | that turn | never (history is immutable) | yes |
+
+Pinned providers are the new piece:
+
+```
+session->pinContext(id, [](){ return materialized blocks; });
+session->unpinContext(id);
+```
+
+The assembler calls every pinned provider at **every `send()`** and splices
+the result as text blocks
+**prepended to the turn's typed user message** — the last user-role wire
+message that does not carry tool results (falling back to the tool-result
+carrier, after its leading `tool_result` blocks, and to a synthetic user
+message when the history has no user message at all). Prepending into an
+existing message rather than inserting a separate one keeps strict
+user/assistant alternation, which some provider APIs enforce.
+
+The fixed anchor and the per-turn refresh split the cache cost fairly:
+within a turn's tool loop the pinned blocks are byte-identical (continuations
+replay the payload — pure appends, cache hits); the next `send()` re-reads
+the files, and a change invalidates the cache only from the turn's anchor,
+not from the system prefix. The materialized block's label states its capture
+time ("content as of this turn") because a tool may mutate the file mid-loop;
+the model sees such changes through the tool results themselves. Pinned
+content is never stored in history and never persisted — never duplicated
+turn-over-turn.
+
+Invoked-skill bodies move the opposite way: out of the system layer into the
+**user blocks of that turn** (a dedicated block type), so they persist in
+history and survive the rest of the conversation (fixes problem 4).
+
+### 4.2 Placement policy (single table, owned by the assembler)
+
+| Content | Position in request | Why |
+|---|---|---|
+| `agent.system` (rendered TOML `system_prompt`) | system, first | static per agent → max cache reuse |
+| `env.project`, `skills.catalog` | system, after agent | changes rarely |
+| pipeline layers (`refactor`, `compression`, `completion.context`) | system, last | fresh session each time, ordering irrelevant |
+| history | messages | as is |
+| pinned materializations | text blocks prepended to the turn's typed user message, live content | fixed anchor keeps the prefix cache-stable; content refreshes because tools mutate files at any moment |
+| task blocks | last user message | the turn itself |
+
+`ClaudeCacheControl` breakpoints stay as they are (system / history tail);
+this ordering is what makes them effective.
+
+### 4.3 Tool continuations stay in LLMQore (replay)
+
+The tool loop deliberately stays in LLMQore — the library is a complete,
+standalone agentic client, and the loop (execute tools, count rounds,
+schedule the next request, stream) is *mechanism*, which per
+`target-architecture.md` design principle 3 belongs in C++ identically for
+all providers. Continuation *content* is the library's default replay: the
+base payload plus the assistant message and appended tool results.
+
+An inversion hook (`setContinuationPayloadBuilder`, an optional per-request
+callback letting `Session` re-assemble each continuation through
+`ContextAssembler`) was implemented and **reverted 2026-06-12**: the problem
+it solved was judged contrived. The replay already carries the full filtered
+history of its base payload, mid-loop file changes reach the model through
+the tool results themselves, and continuation growth within one turn is
+bounded by `maxToolContinuations` — budget enforcement at `send()` time
+covers the realistic cases. Consequences accepted with the revert: the
+manifest logs one entry per `send()` (not per wire request), and pinned
+content is byte-stable for the duration of a turn's tool loop (§4.1).
+
+2026-06-13 the loop's *shape* inside LLMQore was refactored without changing
+this decision (see `tool-loop-runner-plan.md`): the loop policy now lives in
+`ToolLoopRunner` (per-request round state, limit, continuation decision) and
+`BaseClient` slimmed to transport + tool dispatch with public primitives
+`continueRequest` / `buildReplayContinuation` / `abortRequest`. Continuation
+content is still the replay. QodeAssist sets the round limit via
+`client->toolLoop()->setMaxRounds(...)`; the old `setMaxToolContinuations`
+stays as a forwarder for compatibility.
+
+### 4.4 Budget
+
+`ContextAssembler` consults a `BudgetPolicy` before producing the context:
+
+```
+input_estimate = TokenEstimator(system + history + pinned + task)
+limit          = agent context_window − body.max_tokens (output reserve)
+```
+
+`context_window` comes from provider/model metadata with an optional agent
+TOML override. When the estimate exceeds the limit the policy returns a trim
+plan executed in deterministic order:
+
+1. elide bodies of tool results older than the last N rounds
+   (`[tool result elided — N tokens]` placeholder, pairing preserved);
+2. elide materializations of old stored images/attachments (placeholder
+   block, reference kept in history);
+3. below a hard floor — refuse with `ErrorCategory::Validation` and surface
+   "compress the conversation" (ChatCompressor) in the UI.
+
+v1.0 ships stages: **estimate + manifest + UI warning** first (no silent
+trimming), then stage 1–2 elision, then auto-compression hooks. The
+architecture fixes the *seam*; the policy can stay minimal.
+
+`TokenEstimator` is calibrated per provider/model from `Usage` events
+(§8.5 of the target architecture) — chars-per-token ratio updated after every
+response; the chat token counter and the budget share this one estimator.
+
+### 4.5 Materialization and caching
+
+Stored content (attachments, images) stays reference-only in history;
+materialization happens in the assembler through the `ContentLoader`. Two
+fixes over today:
+
+- the loader result is cached per `(storedPath, mtime, size)` — no re-reading
+  the whole conversation's binaries on every send, and byte-identical turns
+  keep the provider prompt cache warm;
+- a failed load produces an **explicit placeholder block**
+  (`[attachment unavailable: name.png]`) instead of silently vanishing —
+  the model can say so, the manifest records it (fixes problem 5).
+
+### 4.6 Observability: the context manifest
+
+Every `assemble()` emits one debug-category log entry and a struct on the
+event stream:
+
+```
+manifest {
+  layers:   { agent.system: ~1.9k tok, env.project: ~70, skills.catalog: ~640 }
+  history:  26 messages, ~14.2k tok (3 tool rounds)
+  pinned:   { linked:src/main.cpp: ~2.1k }
+  task:     ~310 tok, 1 image (cached)
+  elided:   [ tool_result a4f1 (~8k) ]
+  estimate: ~19.3k / limit 32k
+}
+```
+
+Nothing is dropped silently — every filter (unsigned thinking, orphaned tool
+pairs, failed loads, budget elisions) leaves a manifest record. The token
+counter UI reads the same struct.
+
+---
+
+## 5. Wire contract — `ctx.*` stays, gains one producer
+
+`Templates::ContextData` (→ `ctx.system_prompt`, `ctx.history`,
+`ctx.prefix/suffix`, `ctx.files_metadata`) remains the contract between the
+core and `[body]` templates — it is not legacy, it is the template-facing
+view of the assembled context. The change is that exactly one function
+produces it (`ContextAssembler::assemble`), for every request, and
+`toLegacyContext`/`buildLegacyContext` are renamed into it. Existing
+serialization rules carry over unchanged: system messages never enter
+history, unsigned thinking is dropped, orphaned tool_use/tool_result pairs
+are filtered, `CompletionContent` becomes `prefix`/`suffix`.
+
+---
+
+## 6. Migration plan
+
+Ordered so every step lands independently and shrinks risk:
+
+1. **Extract `ContextAssembler`** from `buildLegacyContext` (pure, unit-tested
+   against fixture histories) + manifest logging + failed-load placeholder
+   blocks. No behavior change otherwise. — DONE 2026-06-12
+   (`sources/Session/ContextAssembler.{hpp,cpp}`, `test/ContextAssemblerTest.cpp`;
+   manifest logged under the `qodeassist.context` category).
+2. **ContentLoader cache** keyed by `(path, mtime, size)`. — DONE 2026-06-12
+   (`StoredContentCache` in `ChatSerializer`, owned per-chat by
+   `ClientInterface`, cleared on chat switch).
+3. **Pinned providers**: linked files and open-files sync move out of the
+   `chat.context` system layer; invoked-skill bodies move into the turn's
+   user blocks. `chat.context` shrinks to project info + skills catalog.
+   — DONE 2026-06-12 (`Session::pinContext/unpinContext`, pinned splice in
+   `ContextAssembler::assemble`; `SkillInvocationContent` block persisted via
+   `MessageSerializer`, invisible in the chat UI by design; open-files sync is
+   covered because `ChatRootView` merges open editors into the linked list).
+4. **Shared `EnvBlockFormatter`** in ContextEngine; chat/refactor/completion
+   stop hand-formatting project/file info. — DONE 2026-06-12
+   (`context/EnvBlockFormatter.{hpp,cpp}`: pure `formatProject`/`formatFile`
+   + the `currentProject()` QtC gatherer; chat project block, refactor file
+   header, and completion's `getLanguageAndFileInfo` all route through it).
+5. ~~**Continuation payload callback**~~ — REVERTED 2026-06-12 (implemented,
+   then judged a solution to a contrived problem; see §4.3). Continuations
+   are LLMQore's default replay; `ContextAssembler` runs once per `send()`.
+6. **TokenEstimator + BudgetPolicy seam** — estimate + warning first, then
+   elision stages.
+7. **ContextEngine port split** (delta #9 of the target architecture) —
+   `EditorContext` / `ProjectContext` / `TokenEstimator` behind ports, QtC
+   API only in `ide/context` adapters.
+
+---
+
+## 7. Open questions
+
+1. ~~**Pinned placement**~~ — RESOLVED 2026-06-12: text blocks prepended to
+   the last user-role wire message (synthetic user message only when there is
+   none). A separate synthetic message would break strict role alternation on
+   some provider APIs; cache behaviour of the two shapes is identical.
+2. ~~**Tool-loop relocation cost**~~ — RESOLVED 2026-06-12: relocation
+   rejected (LLMQore is deliberately a standalone agentic client). The
+   follow-up `setContinuationPayloadBuilder` inversion hook was also
+   implemented and reverted the same day — replay is the accepted behaviour
+   (§4.3).
+3. **Budget v1 scope** — warn-only vs. enabling tool-result elision
+   immediately. Elision changes what the model sees; needs live validation.
+4. **Completion and open files** — should completion gain pinned open-files
+   context (cheap with this design), or stay prefix/suffix-only for latency?
--- a/docs/core-class-diagram.svg
+++ b/docs/core-class-diagram.svg
--- a/docs/creating-agents.md
+++ b/docs/creating-agents.md
@@ -0,0 +1,349 @@
+# Creating and Extending Agents
+
+An *agent* is a TOML profile that tells QodeAssist which provider to call,
+which model to use, and exactly what request body to send. All bundled agents
+(Settings → QodeAssist → Agents) are built from the same files described here —
+anything a bundled agent does, a user agent can do too.
+
+## Where user agents live
+
+Drop `*.toml` files into the user agents directory:
+
+| OS | Path |
+|---|---|
+| Linux / macOS | `~/.config/QtProject/qtcreator/qodeassist/config/agents/` |
+| Windows | `%APPDATA%\QtProject\qtcreator\qodeassist\config\agents\` |
+
+QodeAssist creates the directory on startup. Files are loaded at plugin
+startup; after adding or editing a file, restart Qt Creator.
+
+Two layers are loaded:
+
+1. **Bundled** agents shipped inside the plugin — read-only.
+2. **User** agents from the directory above (marked with a `user` pill).
+
+Agent `name`s are global across both layers. A user file that reuses a
+bundled agent's `name` is rejected with an error — bundled agents cannot be
+replaced; create your own agent under a new name and `extends` what you want
+to build on. Two *user* files with the same `name` produce a warning, and
+the alphabetically later file wins.
+
+Load errors and warnings (TOML syntax, unknown keys, missing `extends`
+parents, bodies that don't render to valid JSON) are reported in Qt Creator's
+**General Messages** pane, prefixed with `[Agents]`.
+
+## Minimal example
+
+A custom agent is a thin delta over a bundled **wire base**: extend it, set the
+model, override only what differs. The base already carries the provider, the
+endpoint and the request-body serialization — you add the policy.
+
+```toml
+schema_version = 1
+
+extends = "Claude Base Chat"
+name    = "My Claude"
+model   = "claude-sonnet-4-6"
+```
+
+Override a body field or the persona:
+
+```toml
+schema_version = 1
+
+extends = "Claude Base Chat"
+name    = "My Claude (low temp)"
+model   = "claude-sonnet-4-6"
+
+system_prompt = """You are a terse code reviewer."""
+
+[body]
+temperature = 0.3
+```
+
+Point a base at a different OpenAI-compatible provider by overriding the
+provider instance and model:
+
+```toml
+schema_version = 1
+
+extends           = "OpenAI Base Chat"
+name              = "My DeepSeek"
+provider_instance = "OpenAI Compatible"
+model             = "deepseek-chat"
+```
+
+Bundled agents are read-only — vary a preset by creating your own under a new
+name. If all you want is a different model, you don't even need a file: set the
+per-agent model override in the settings UI.
+
+## Key reference
+
+| Key | Required | Meaning |
+|---|---|---|
+| `schema_version` | no (default 1) | Format version; the plugin refuses files newer than it supports. |
+| `name` | yes | Unique identifier; shown in the UI, referenced by rosters and `extends`. |
+| `description` | no | Tooltip text in the Agents list. |
+| `provider_instance` | yes* | Name of a provider instance (see below). |
+| `model` | yes* | Default model; can be overridden per agent in settings. |
+| `endpoint` | yes* | Path appended to the provider instance URL. May contain `${MODEL}` (e.g. Google: `/models/${MODEL}:streamGenerateContent?alt=sse`). |
+| `system_prompt` | no | Jinja template for the system prompt (see building blocks below). FIM agents usually omit it. |
+| `tags` | no | Free-form strings shown as pills in the UI for discoverability. |
+| `enable_thinking` | no | Capability hint (UI badge). The `[body]` is the source of truth for what is sent. |
+| `enable_tools` | no | Lets the provider inject tool definitions into the request. |
+| `cache_prompt` / `cache_ttl` | no | Prompt caching (Anthropic); `cache_ttl = "1h"` selects the long TTL. |
+| `cache_breakpoints` | no | Which cache points to set when `cache_prompt` is on: any of `"system"`, `"tools"`, `"history"`. Empty/omitted = all three. |
+| `extends` | no | Name of a parent agent to inherit from. |
+| `abstract` | no | Mark as template-only: it can be extended but is never loaded as a usable agent. Not inherited. |
+| `hidden` | no | Loaded and usable, but not listed in selection UIs. Not inherited. |
+| `[match]` | no | Routing constraints (see Routing). |
+| `[body]` | yes* | The literal request body (see below). |
+
+\* required after `extends` resolution — a child inherits these from its
+parent, so it only states what differs.
+
+### Required keys checked at load
+
+A concrete (non-abstract) agent must end up with `name`,
+`provider_instance`, `model`, `endpoint`, and a non-empty `[body]`. Unknown
+keys anywhere at the top level or in `[match]` produce a warning — this
+catches typos like `enable_thinkin`.
+
+## Provider instances
+
+`provider_instance` refers to a provider configuration (URL + API key
+reference + client API). Bundled instances:
+
+`Claude`, `Codestral`, `Google AI`, `llama.cpp`,
+`LM Studio (Chat Completions)`, `LM Studio (Responses API)`, `Mistral AI`,
+`Ollama (Native)`, `Ollama (OpenAI-compatible)`, `OpenAI (Chat Completions)`,
+`OpenAI (Responses API)`, `OpenAI Compatible`, `OpenRouter`.
+
+User-defined instances live next to agents in
+`…/qodeassist/config/providers/*.toml` and follow the same
+override-by-name layering.
+
+## `extends` — inheriting from another agent
+
+A child deep-merges over its parent: scalar keys are replaced, tables (such
+as `[body]` and `[body.options]`) merge key-by-key, and **arrays are replaced
+whole** (a child that wants the parent's `tags` plus one more must restate
+the full list). Chains can be deeper than one level; cycles and missing
+parents are load errors.
+
+`abstract` and `hidden` are never inherited — extending a hidden agent
+yields a visible child unless the child says otherwise.
+
+Every provider ships an **abstract wire base** that carries only the provider
+instance, endpoint and the request-body serialization — no model, persona,
+tags or sampling. Extending one and setting `model` is all a custom agent
+needs:
+
+| Base | Provider / API |
+|---|---|
+| `Claude Base Chat` | Claude, Anthropic Messages (`/v1/messages`) |
+| `OpenAI Base Chat` | OpenAI, Chat Completions (`/chat/completions`) |
+| `OpenAI Responses Base` | OpenAI, Responses API (`/responses`) |
+| `Google Base Chat` | Google AI, Gemini `generateContent` |
+| `Ollama Base Chat` | Ollama, native `/api/chat` |
+| `Ollama FIM Base` | Ollama, native `/api/generate` fill-in-the-middle |
+
+For any OpenAI-compatible provider (Mistral, OpenRouter, LM Studio, llama.cpp,
+DeepSeek, …) extend `OpenAI Base Chat` and override `provider_instance`.
+
+Each bundled concrete agent (`Claude Sonnet Chat`, `Claude Code Completion`,
+`OpenAI Chat Completions`, `OpenAI Responses Chat`, `Google Chat`,
+`Ollama Chat`, `Ollama FIM`) is itself a thin delta over one of these bases and
+works as a parent too — `extends = "Claude Sonnet Chat"` inherits everything including
+the model.
+
+## `[body]` — the request, literally
+
+`[body]` is the request body, written exactly like the provider's curl
+example. Per key, recursively:
+
+- **string containing jinja** (`{{` or `{%`) — rendered, and the output is
+  spliced in as raw JSON. A render that produces nothing drops the key.
+- **plain string / number / bool / table** — passed through unchanged.
+
+```toml
+[body]
+max_tokens  = 16000
+stream      = true
+thinking    = { type = "adaptive", display = "summarized" }
+```
+
+The message-array serialization (`messages` / `contents` / `input`, plus the
+`system` renderer) lives in the **wire base**; a concrete agent that extends a
+base inherits it and usually sets only scalar policy fields like the ones
+above. A from-scratch agent (no `extends`) must carry the full serialization
+itself — copy a bundled base's `[body]` as the starting point.
+
+There are no runtime toggles: a thinking variant is a separate agent file
+that carries the thinking fields in its body.
+
+Every agent body is dry-run rendered at load against a synthetic
+conversation (text, thinking, tool calls, tool results, images), so jinja
+syntax errors, unknown callbacks, missing partials, and invalid JSON are
+reported at startup — not mid-conversation. Trailing commas emitted by loops
+are stripped automatically; don't bother with `loop.is_last` bookkeeping.
+
+### Template data (`ctx`)
+
+| Field | Content |
+|---|---|
+| `ctx.system_prompt` | Rendered system prompt (present only if the agent has one). |
+| `ctx.prefix` / `ctx.suffix` | Code around the cursor (FIM/completion sessions). |
+| `ctx.files_metadata` | Array of `{ file_path, content }` for attached files. |
+| `ctx.history` | Array of messages: `{ role, content, content_blocks, images? }`. |
+
+`content` is the message's flattened text; `content_blocks` is the
+structured form:
+
+| `type` | Fields |
+|---|---|
+| `text` | `text` |
+| `thinking` | `thinking`, `signature` |
+| `redacted_thinking` | `data` |
+| `tool_use` | `id`, `name`, `input` (JSON object) |
+| `tool_result` | `tool_use_id`, `content`, `name` |
+| `image` | `data`, `media_type`, `is_url` |
+
+### Callbacks available in `[body]`
+
+| Callback | Purpose |
+|---|---|
+| `tojson(x)` | Serialize any value as JSON (correct quoting/escaping). Use it for every interpolated value. |
+| `filter_by_type(blocks, "tool_use")` | Subset of `content_blocks` with the given type. |
+| `filter_skip_role(history, "system")` | History without messages of a role. |
+| `strip_signature_suffix(s)` | Remove a trailing `[Signature: …]` marker. |
+
+### Partials and `{% include %}`
+
+The message-array serialization is **inlined directly in each bundled wire
+base** — there are no bundled partials to include. The `{% include %}`
+mechanism still works for *your own* partials: drop a `partials/*.jinja` next
+to your agent TOML and include it with
+`{% include "partials/my_messages.jinja" %}`. Includes resolve against the
+bundled root first, then the user agent's own directory; paths with `..` or a
+leading `/` are rejected.
+
+## `system_prompt` — composable building blocks
+
+`system_prompt` is itself a jinja template, rendered with:
+
+| Helper | Purpose |
+|---|---|
+| `{{ read_file("${PROJECT_DIR}/STYLE.md") }}` | Inline a file. Reads are restricted to the project directory, your QodeAssist user directory (`${CONFIG_DIR}`), and bundled `:/…` resources. |
+| `{{ file_exists(p) }}` / `{{ read_dir(p) }}` | Existence check / directory listing (same root restrictions). |
+| `{{ head_lines(s, n) }}` | First `n` lines of a string. |
+| `basename`, `dirname`, `ext`, `lower`, `upper` | Path/string helpers. |
+| `${PROJECT_DIR}`, `${CONFIG_DIR}` | Substituted before rendering. `${CONFIG_DIR}` is your QodeAssist user directory (where agent configs live). |
+
+Example:
+
+```toml
+system_prompt = """
+{{ read_file("${CONFIG_DIR}/roles/reviewer.md") }}
+
+{% if file_exists("${PROJECT_DIR}/.qodeassist-style.md") %}
+Project conventions:
+{{ read_file("${PROJECT_DIR}/.qodeassist-style.md") }}
+{% endif %}
+"""
+```
+
+Reads fail **loud**: a path outside those roots — or a `read_file` whose target
+is missing — aborts the request with a clear error instead of silently rendering
+an empty prompt. For a genuinely optional file, guard it with `file_exists`,
+which returns `false` for an allowed-but-absent path; only a path *outside* the
+roots is treated as an authoring error and rejected outright.
+
+The persona is simply what `system_prompt` renders to — inline the text or pull
+shared text from a markdown file with `read_file`. The bundled chat agents do
+exactly this: their `system_prompt` is `{{ read_file(":/roles/qt-cpp-developer.md") }}`,
+reading the shipped role from the plugin resources. To switch personas in the
+chat, switch agents: a persona variant is a thin `extends` child that overrides
+only `system_prompt` (e.g. pointing `read_file` at any file of your own under
+`${CONFIG_DIR}/…` or `${PROJECT_DIR}/…`). `read_file` reads exactly the path
+you give it — there is no override convention that swaps a bundled file for a
+same-named user file.
+
+## Routing — `[match]` and the completion roster
+
+`[match]` drives **code completion** routing only. Completion has an ordered
+roster of agents; for the current file the **first roster entry whose `[match]`
+accepts** wins. The other pipelines don't route: chat shows an allow-list of
+agents and you pick one in the panel; quick refactor and chat compression each
+use a single configured agent (set in QodeAssist → General).
+
+```toml
+[match]
+file_patterns = ["*.qml", "*.js"]
+path_patterns = ["*/tests/*"]
+project_names = ["MyProject"]
+```
+
+- Dimensions are ANDed; an empty dimension is unconstrained; an entirely
+  empty/absent `[match]` is a catch-all.
+- `file_patterns` are case-insensitive globs tested against the file name
+  and the full path; `path_patterns` against the full path only.
+- `project_names` are exact, case-sensitive project names.
+
+Typical completion setup: a specialized agent (e.g. an `Ollama FIM` variant
+with `*.qml`) first, a catch-all agent last.
+
+## Models
+
+The TOML `model` is only the default. The settings UI can set a per-agent
+override (stored in `agent_models.json`); the resolved model is also
+substituted into `${MODEL}` in `endpoint` before sending.
+
+## Contributing your agent to QodeAssist
+
+The bundled agent set grows through contributions — if you've made an agent
+for a provider or model that others could use, please send it upstream
+instead of keeping it local. No C++ is needed:
+
+1. Develop and verify the agent locally in the user agents directory.
+2. In a fork, copy the TOML to `sources/agents/` and register the file in
+   `sources/agents/agents.qrc`.
+3. Keep it a thin delta: extend the matching provider base and set only
+   `name`, `description`, `model`, `tags` (and `[body]` keys that genuinely
+   differ). Look at `claude_chat.toml` or `ollama_fim.toml` for the expected
+   shape.
+4. Run the tests (`QodeAssistTest`): `BundledAgentsTest` automatically
+   loads every bundled agent, resolves its `extends` chain, and dry-renders
+   its `[body]` — if your TOML passes, it works.
+5. Open a pull request.
+
+Conventions:
+
+- File name: `<provider>_<model_or_purpose>_<kind>.toml`
+  (e.g. `openrouter_deepseek_chat.toml`).
+- `name` is user-visible and must be unique; include the provider and model
+  (e.g. `OpenRouter DeepSeek Chat`).
+- Specialized completion agents should carry a `[match]` block so routing
+  can pick them automatically (e.g. `file_patterns = ["*.qml"]`).
+- A new OpenAI-compatible provider is TOML-only: add a provider instance file
+  in `sources/providersConfig/`, then a concrete agent that `extends`
+  `OpenAI Base Chat` and overrides `provider_instance`. A genuinely new
+  request/response *format* (a new wire base) is the only thing that needs C++.
+
+## Troubleshooting
+
+- **Agent missing from the list** — check General Messages for `[Agents]
+  error:` lines; the file failed to parse, resolve, or validate.
+- **`… has the same name as a bundled agent — bundled agents cannot be
+  replaced`** — pick a different `name`; use `extends` to inherit from the
+  bundled agent instead.
+- **`Unknown key 'x' … ignored (typo?)`** — the key isn't part of the
+  schema; compare with the table above.
+- **`Agent 'X' extends unknown agent 'Y'`** — the parent's `name` (not file
+  name) must match exactly; the parent must be bundled or in the same
+  directory.
+- **`[body] failed to render to valid JSON`** — the dry run failed; the log
+  contains the rendered snippet. Usually a missing `tojson(...)` around an
+  interpolated string.
+- **Edits not picked up** — agents are loaded at startup; restart
+  Qt Creator.
--- a/docs/file-context.md
+++ b/docs/file-context.md
@@ -5,8 +5,10 @@ QodeAssist provides two powerful ways to include source code files in your chat
 ## Attached Files

 Attachments are designed for one-time code analysis and specific queries:
- Files are included only in the current message
- Content is discarded after the message is processed
+- Files are sent as part of the current message
+- The content is a snapshot taken at send time: it is stored with the chat
+  and stays in the conversation history exactly as sent, even if the file
+  changes on disk later
 - Ideal for:
  - Getting specific feedback on code changes
  - Code review requests
@@ -20,8 +22,11 @@ Attachments are designed for one-time code analysis and specific queries:
 Linked files provide persistent context throughout the conversation:

 - Files remain accessible for the entire chat session
- Content is included in every message exchange
- Files are automatically refreshed - always using latest content from disk
+- Files are automatically refreshed — every request re-reads them and sends
+  the latest content from disk
+- The snapshot travels next to your latest message and is never duplicated
+  into the conversation history, so linked files do not bloat the chat as it
+  grows
 - Perfect for:
  - Long-term refactoring discussions
  - Complex architectural changes
--- a/docs/project-rules.md
+++ b/docs/project-rules.md
@@ -1,35 +0,0 @@
-# Project Rules Configuration
-
-QodeAssist supports project-specific rules to customize AI behavior for your codebase. Create a `.qodeassist/rules/` directory in your project root.
-
-## Quick Start
-
-```bash
-mkdir -p .qodeassist/rules/{common,completion,chat,quickrefactor}
-```
-
-## Directory Structure
-
-```
-.qodeassist/
-└── rules/
-    ├── common/           # Applied to all contexts
-    ├── completion/       # Code completion only
-    ├── chat/            # Chat assistant only
-    └── quickrefactor/   # Quick refactor only
-```
-
-All `.md` files in each directory are automatically loaded and added to the system prompt.
-
-## Example
-
-Create `.qodeassist/rules/common/general.md`:
-
-```markdown
-# Project Guidelines
- Use snake_case for private members
- Prefix interfaces with 'I'
- Always document public APIs
- Prefer Qt containers over STL
-```
-
--- a/docs/quick-refactoring.md
+++ b/docs/quick-refactoring.md
@@ -206,7 +206,6 @@ The LLM receives:
 - **Cursor Position**: Marked with `<cursor>` tag
 - **Selection Markers**: `<selection_start>` and `<selection_end>` tags
 - **Your Instructions**: Built-in, custom, or typed
- **Project Rules**: If configured (see [Project Rules](project-rules.md))

 ### Context Configuration

@@ -270,7 +269,6 @@ Fully local setup for offline or secure environments.

 ## Related Documentation

- [Project Rules](project-rules.md) - Project-specific AI behavior customization
 - [File Context](file-context.md) - Attaching files to chat context
 - [Ignoring Files](ignoring-files.md) - Exclude files from AI context
 - [Provider Configuration](../README.md#configuration) - Setting up LLM providers
--- a/docs/target-architecture.md
+++ b/docs/target-architecture.md
@@ -0,0 +1,654 @@
+# QodeAssist — Target Architecture (v1.0)
+
+Status: design baseline, derived from the fixed use-case inventory below.
+Scope: the complete plugin, designed "from scratch" — what the architecture
+should be if nothing legacy constrained it. The current code (see
+`architecture.md`) already converges on this; §10 lists the remaining deltas.
+
+---
+
+## 1. Use-case inventory (requirements baseline)
+
+Every architectural decision below is justified by one of these. Features not
+on this list (Rules system, legacy provider/model/template pickers, Stack A)
+are intentionally out of scope.
+
+| # | Use case | What the user gets |
+|---|----------|--------------------|
+| U1 | **Code completion** | Inline FIM/instruct suggestions via LSP; auto + manual trigger, multiline, smart-context suppression, accept full / word-by-word |
+| U2 | **Chat assistant** | 4 placements (sidebar, bottom pane, editor tab, floating window); streaming text + thinking blocks + tool blocks + file-edit blocks (apply/undo); attachments, linked files, @-mentions, open-files sync; token counter; persisted history; one-click summarization; runtime agent picker |
+| U3 | **Quick refactor** | Selection + instruction by hotkey; custom-instructions library; separate agent; optional tools; streamed result inserted into the editor |
+| U4 | **Tools** | read/create/edit file, search, find, list, build, diagnostics, terminal, todo, load_skill; per-tool enable |
+| U5 | **Skills** | discovery from `.qodeassist/skills`, `.claude/skills`, `~/.claude/skills`; auto-injection, explicit `/` picker, always-on |
+| U6 | **MCP** | server mode (expose plugin tools, HTTP/SSE + stdio bridge) and client hub (consume external tools in chat/refactor) |
+| U7 | **Providers** | 13 `client_api` types over one GenericProvider; secrets store; local-server autostart; model listing |
+| U8 | **Agents** | TOML profiles: abstract wire-base + thin concrete via `extends`, `[body]` table 1:1 with the wire request (message serialization inlined per base), `match` rules (completion routing), `cache_breakpoints`, per-agent model override, per-pipeline agent selection |
+| U9 | **Personas** | persona = the agent's `system_prompt`; shared text lives in plain files pulled in via `read_file` — bundled defaults under `:/roles/…`, or any file the user points at under `${PROJECT_DIR}` / `${CONFIG_DIR}` (your QodeAssist user directory); `read_file` reads the literal path given (no override/fallback resolution); switching persona = switching agent (no separate Roles subsystem) |
+| U10 | **Configuration UI** | settings pages for everything above; per-project settings; updater + status widget |
+
+---
+
+## 2. Design principles
+
+1. **One stack.** Every LLM byte — completion, chat, compression, refactor —
+   flows through the same `Session` pipeline. No parallel legacy path.
+2. **Hexagonal core.** The runtime (agents, sessions, providers, templates,
+   prompt rendering) has zero Qt Creator dependencies. The IDE host composes
+   that core; IDE-specific facts enter only through ports (document reading,
+   project scanning, secrets, tool hosting).
+3. **Configuration is declarative, code is mechanism.** What is sent (request
+   `[body]`, system prompt, endpoint, model) lives in TOML/JSON/Jinja and is
+   user-overridable; *how* it is sent (streaming, retries, tool loop, event
+   routing) lives in C++ and is identical for all providers.
+4. **Agent-driven behavior.** The agent's TOML declares what a conversation
+   uses (`enable_tools`, `enable_thinking`); features and UI adapt to the
+   agent config instead of switching on provider names or provider-declared
+   capability flags.
+5. **Single source of truth for conversation state.** `ConversationHistory`
+   owns the messages; `ChatModel` and persistence are projections of it, never
+   independent copies.
+6. **Per-feature composition roots, no singletons.** Each feature constructs
+   and owns its dependencies (`new` + parent); shared services are passed
+   explicitly (constructor/setter, QML context properties for the chat).
+7. **Streaming-first event model.** One typed `ResponseEvent` stream is the
+   only contract between the core and every consumer. Deltas exist for live
+   UI (chat); one-shot pipelines (completion, refactor) ignore them,
+   wait for `finished`, and read the final assistant message from history.
+8. **Fail at load, not mid-conversation.** Agent profiles are validated when
+   loaded (partials resolve, assembled body parses as JSON against a synthetic
+   context), so a config error never surfaces as a silent runtime drop.
+
+---
+
+## 3. Layered model
+
+```mermaid
+flowchart TB
+    subgraph HOSTS["Hosts — composition roots"]
+        PLUGIN["Qt Creator plugin<br/>qodeassist.cpp"]
+    end
+
+    subgraph L5["L5 · Presentation"]
+        LSP["LSP bridge<br/>inline suggestions"]
+        QMLUI["ChatView QML<br/>4 placements"]
+        RW["Refactor widgets"]
+        SUI["Settings pages"]
+    end
+
+    subgraph L4["L4 · Features"]
+        FCOMP["CompletionFeature"]
+        FCHAT["ChatFeature"]
+        FREF["RefactorFeature"]
+    end
+
+    subgraph L3["L3 · Capabilities"]
+        CTX["ContextEngine<br/>ports + QtC adapters"]
+        TOOLS["ToolKit"]
+        SKILLS["SkillsEngine"]
+        MCPH["McpHub<br/>client + server"]
+    end
+
+    subgraph L2["L2 · Core runtime — IDE-independent"]
+        SM["SessionManager"]
+        SESS["Session"]
+        AGF["AgentFactory + AgentRouter"]
+        AG["Agent"]
+        PROV["GenericProvider"]
+        TPL["JsonPromptTemplate"]
+    end
+
+    subgraph L1["L1 · Declarative config"]
+        PCONF["providers/*.toml"]
+        ACONF["agents/*.toml + partials/*.jinja"]
+        ROST["rosters / pipelines"]
+        PERS["personas/*.md"]
+        SKCONF["skills/*.md"]
+        SEC["SecretsStore"]
+    end
+
+    subgraph L0["L0 · Wire — LLMQore"]
+        CLIENTS["*Client — SSE streaming"]
+        TOOLFW["Tool framework"]
+        MCPT["MCP transports"]
+    end
+
+    PLUGIN --> L4
+    PLUGIN --> SUI
+    LSP --> FCOMP
+    QMLUI --> FCHAT
+    RW --> FREF
+    FCOMP --> SM
+    FCHAT --> SM
+    FREF --> SM
+    FCOMP --> CTX
+    FCHAT --> CTX
+    FREF --> CTX
+    FCHAT --> SKILLS
+    FCHAT --> TOOLS
+    FREF --> TOOLS
+    TOOLS --> TOOLFW
+    MCPH --> MCPT
+    SM --> SESS
+    SESS --> AG
+    AGF --> AG
+    AG --> PROV
+    AG --> TPL
+    AGF --> ACONF
+    AGF --> PCONF
+    AGF --> SEC
+    AGF --> ROST
+    TPL --> PERS
+    PROV --> CLIENTS
+    SKILLS --> SKCONF
+```
+
+### Layer contracts
+
+| Layer | Contains | May depend on | Must NOT depend on |
+|-------|----------|---------------|--------------------|
+| **L0 Wire** | LLMQore clients (one per wire protocol: Claude, OpenAI Chat, OpenAI Responses, Google, Ollama, Mistral, llama.cpp), tool framework, MCP transports | Qt Network | anything above |
+| **L1 Config** | `ProviderInstance`, `AgentProfile` (+ loader/validator), rosters, personas, skills, secrets port | toml++, inja | Qt Creator, L2+ |
+| **L2 Core** | `Agent`, `AgentFactory`, `AgentRouter`, `Provider`/`GenericProvider`, `JsonPromptTemplate`, `Session`, `SessionManager`, `ConversationHistory`, `SystemPromptBuilder`, `ResponseRouter`, `ToolContributorRegistry` | L0, L1 | Qt Creator, QML, features |
+| **L3 Capabilities** | `ContextEngine` (ports + QtC adapters), `ToolKit` (built-in tools), `SkillsEngine`, `McpHub` | L0–L2, QtC APIs *only in adapters* | features, UI |
+| **L4 Features** | `CompletionFeature`, `ChatFeature` (send/stream, compression, token counting, file edits), `RefactorFeature` | L2, L3 | each other |
+| **L5 Presentation** | LSP bridge, ChatView QML, refactor widgets, settings pages | its feature | core internals |
+| **Hosts** | plugin shell | everything (composition only) | — |
+
+The hard rule that makes testability free: **L0–L2 build into
+targets with no Qt Creator linkage.** Tests link L0–L2 directly;
+the plugin adds L3 adapters, L4, L5.
+
+---
+
+## 4. Core domain model
+
+Rendered copy: [core-class-diagram.svg](core-class-diagram.svg) (regenerate
+when the diagram below changes).
+
+```mermaid
+classDiagram
+    direction TB
+    class SessionManager {
+        +acquire(agentName) Session
+        +release(session)
+        +toolContributors() ToolContributorRegistry
+    }
+    class Session {
+        +send(blocks)
+        +cancel()
+        +history() ConversationHistory
+        +systemPrompt() SystemPromptBuilder
+        +event(ResponseEvent)
+        +finished(id, stopReason)
+        +failed(id, ErrorInfo)
+        +cancelled(id)
+    }
+    class ConversationHistory {
+        +messages() vector~Message~
+        +lastAssistantText() string
+        +append(Message)
+        +reset(vector~Message~)
+    }
+    class Message {
+        +role Role
+        +blocks vector~ContentBlock~
+    }
+    class SystemPromptBuilder {
+        +setLayer(id, text, priority)
+        +removeLayer(id)
+        +compose() string
+    }
+    class ResponseRouter {
+        +attach(BaseClient)
+        +event(ResponseEvent)
+    }
+    class Agent {
+        +config() AgentConfig
+        +provider() Provider
+        +promptTemplate() PromptTemplate
+    }
+    class AgentFactory {
+        +create(name) Agent
+        +configByName(name) AgentConfig
+        +effectiveModel(name) string
+    }
+    class AgentRouter {
+        +pickAgent(roster, fileCtx) string
+    }
+    class Provider {
+        <<interface>>
+        +prepareRequest(request, ctx)
+        +sendRequest(json) RequestID
+        +cancelRequest(RequestID)
+    }
+    class GenericProvider {
+        -client BaseClient
+    }
+    class PromptTemplate {
+        <<interface>>
+        +buildFullRequest(request, ctx)
+    }
+    class JsonPromptTemplate {
+        -bodySpec QJsonObject
+        -env InjaEnvironment
+    }
+    class ToolContributorRegistry {
+        +registerContributor(fn)
+        +applyTo(ToolsManager)
+    }
+
+    SessionManager o-- Session : pools
+    SessionManager --> AgentFactory : builds via
+    SessionManager --> ToolContributorRegistry
+    Session *-- ConversationHistory
+    Session *-- SystemPromptBuilder
+    Session *-- ResponseRouter
+    Session --> Agent
+    ConversationHistory o-- Message
+    Agent *-- Provider
+    Agent *-- PromptTemplate
+    AgentFactory ..> Agent : creates
+    AgentFactory --> AgentRouter
+    GenericProvider --|> Provider
+    JsonPromptTemplate --|> PromptTemplate
+```
+
+Responsibilities, one line each:
+
+- **Agent** — immutable bundle of *what to call*: resolved config + provider +
+  compiled prompt template. No request state.
+- **Session** — one conversation's runtime: owns history, system-prompt
+  layers, pinned context providers, response routing, the in-flight request,
+  and the content of each dispatched request (tool continuations replay it
+  inside LLMQore; see `context-architecture.md` §4.3).
+  `send(blocks)` is the *only* entry point: every pipeline appends a user
+  message and dispatches; there are no per-pipeline send variants. What
+  differs between completion, chat, and refactor is the agent's template and
+  the consumption mode (deltas vs final message), never the Session API.
+- **SessionManager** — creates/pools sessions per agent; the single place
+  features go to get one. Pooling (not per-message construction) covers the
+  "fresh agent + provider + secrets read per request" latency cost. It reuses
+  only the expensive parts (agent, provider, compiled template, secrets read):
+  `acquire` hands out a session with cleared history and system-prompt
+  layers, so one-shot pipelines never see a previous exchange.
+- **AgentRouter** — the agent picker for *auto-routed* pipelines. Only code
+  completion routes by context: `pickAgent(roster.codeCompletion, {file,
+  project})` walks the ordered roster and returns the first agent whose match
+  rules fit. Chat is user-driven (the picker filters to the `chatAssistant`
+  allow-list; the user chooses); compression and quick refactor each use a
+  single configured agent. No feature-local routing logic beyond these.
+- **GenericProvider** — one class for all 13 client APIs; varies only by
+  LLMQore client factory + metadata. Request *shape* belongs to the template,
+  never to the provider.
+- **JsonPromptTemplate** — compiles the agent's `[body]` table; renders
+  Jinja-bearing string values, splices raw JSON, drops empty keys; validated
+  at load time.
+- **SystemPromptBuilder** — ordered named layers (`agent.system`,
+  `chat.context`, `refactor`, `compression`); features mutate only their own
+  layer.
+- **ResponseRouter / ResponseEvent** — adapts LLMQore client signals into one
+  typed stream: `TextDelta`, `ThinkingDelta`, `ToolCallStart/End`,
+  `ToolResult`, `Usage`, `Error`, `MessageStop`.
+- **ToolContributorRegistry** — contributors (built-in ToolKit, SkillTool,
+  McpHub) register once; `SessionManager` applies them to every new session's
+  `ToolsManager`. This is how MCP tools reach chat *and* refactor (U6) without
+  feature code knowing about MCP.
+
+---
+
+## 5. Runtime flows
+
+### 5.1 Chat (U2) — the richest path
+
+```mermaid
+sequenceDiagram
+    autonumber
+    actor U as User
+    participant V as ChatView QML
+    participant F as ChatFeature
+    participant SM as SessionManager
+    participant S as Session
+    participant T as JsonPromptTemplate
+    participant P as GenericProvider
+    participant C as LLMQore Client
+    participant R as ResponseRouter
+
+    U->>V: message + attachments
+    V->>F: sendMessage(text, files, images)
+    F->>SM: acquire(activeAgent)
+    SM-->>F: Session (pooled)
+    F->>S: systemPrompt().setLayer("chat.context", project + skills + linked files)
+    F->>S: send(userBlocks)
+    S->>T: buildFullRequest(history, system, ctx)
+    T-->>S: request JSON (body is 1:1 with the API)
+    S->>P: sendRequest(json)
+    P->>C: HTTP POST, SSE stream
+    loop streaming
+        C-->>R: chunk / thinking / tool_use / usage
+        R-->>S: ResponseEvent
+        S-->>F: event(ResponseEvent)
+        F-->>V: ChatModel projection update
+    end
+    opt tool call requested
+        S->>S: execute tool via ToolsManager
+        S->>P: continue with tool_result
+    end
+    C-->>R: finalized
+    R-->>S: MessageStop + Usage
+    S-->>F: finished()
+    F->>SM: release(session)
+```
+
+State ownership in chat: `Session.history()` is the truth. `ChatModel` is a
+QML projection built from history events (`messageAdded`, `messageUpdated`);
+`ChatSerializer`/`ChatHistoryStore` persist *history*, and restoring a chat
+seeds a new session's history — never the other way around. File-edit blocks,
+apply/undo, and the token counter are ChatFeature concerns layered on the
+event stream.
+
+### 5.2 Completion (U1)
+
+```
+LSP getCompletionsCycling
+  → CompletionFeature
+      agent   = AgentRouter.pickAgent(roster.codeCompletion, {file, project})
+      session = SessionManager.acquire(agent)
+      ctx     = ContextEngine: prefix/suffix + open-files context (policy from
+                CodeCompletionSettings — editor policy, not agent config)
+      session.send(blocks{completion context})
+  on finished → history().lastAssistantText()
+      → CodeHandler (output-mode post-processing) → LSP items
+```
+
+No special Session method: the completion context travels as the content of
+an ordinary user message (a structured block carrying prefix/suffix + file
+context), and the template context exposes it as `ctx.prefix` / `ctx.suffix`.
+FIM vs instruct is *agent config* (template + body), not feature code: a FIM
+agent's body renders `prefix`/`suffix` into FIM fields; an instruct agent's
+body renders the same exchange as a chat-shaped request. The feature is
+identical for both — and since completion has no incremental UI, it never
+touches the delta stream: it waits for `finished` and reads the last message.
+
+### 5.3 Quick refactor (U3)
+
+```
+Hotkey → RefactorFeature
+  agent   = pipelines.quickRefactor (single configured agent)
+  session = SessionManager.acquire(agent)
+  session.systemPrompt().setLayer("refactor", tagged selection + output rules)
+  session.send(blocks{instruction})
+  on finished → history().lastAssistantText()
+      → ResponseCleaner → RefactorResult → editor insert (accept/reject)
+```
+
+Same consumption mode as completion: the feature listens to
+`Session::finished`/`failed` only (events at most drive a progress spinner
+and cancel) and reads the result from history — it never connects to raw
+client signals. Tool calls during refactor run inside the session's tool
+loop; history's last assistant message is whatever the model produced after
+the final tool round.
+
+### 5.4 Compression (U2)
+
+Compression is ChatFeature reusing the same path with the single
+`pipelines.chatCompression` agent and a `"compression"` system layer; the
+summary starts a new history.
+
+---
+
+## 6. Configuration model
+
+```mermaid
+erDiagram
+    AGENT_PROFILE ||--o| AGENT_PROFILE : extends
+    AGENT_PROFILE }o--|| PROVIDER_INSTANCE : provider_instance
+    AGENT_PROFILE }o--o{ PARTIAL : includes
+    AGENT_PROFILE }o--o{ PERSONA : read_file
+    ROSTER }o--o{ AGENT_PROFILE : ranks
+    MODEL_OVERRIDE |o--|| AGENT_PROFILE : overrides_model
+    PROVIDER_INSTANCE }o--|| CLIENT_API : client_api
+    PROVIDER_INSTANCE }o--o| SECRET : api_key_ref
+    PROVIDER_INSTANCE ||--o| LAUNCH_CONFIG : autostarts
+
+    AGENT_PROFILE {
+        string name
+        bool abstract
+        string system_prompt "jinja; inline text or read_file()"
+        json body "request body, 1:1 with API"
+        string endpoint "may contain MODEL placeholder"
+        string model "default; override wins"
+        bool enable_tools "capability hint"
+        bool enable_thinking "capability hint"
+        json match "file, path, project patterns"
+    }
+    PROVIDER_INSTANCE {
+        string name
+        string client_api
+        string url
+        string api_key_ref
+    }
+    PERSONA {
+        string path "plain markdown file"
+    }
+    ROSTER {
+        string pipeline "completion, chat, compression, refactor"
+        list agents "ordered candidates"
+    }
+```
+
+Rules of the config layer (full spec: `agent-templates-design.md`):
+
+- `[body]` **is** the request body — field-by-field, deep-mergeable through
+  `extends`; Jinja-bearing strings render and splice as raw JSON, literals
+  pass through. No separate sampling/thinking merge machinery.
+- Message serialization is inlined in each abstract **wire base**; there are no
+  bundled partials. `{% include %}` still resolves sandboxed roots (bundled
+  `:/agents/`, then the user agent's dir) for user-supplied partials; a missing
+  partial is a load-time error.
+- Two-level hierarchy: an abstract **wire base** per provider (provider +
+  endpoint + serialization only — no model/persona/tags/sampling) and a thin
+  concrete agent carrying all policy.
+- Per-agent model override lives in `agent_models.json` and is applied by
+  `AgentFactory`; `${MODEL}` in `endpoint` covers URL-model providers.
+- Personas are not a subsystem: the profile's `system_prompt` is the persona.
+  Shared text lives in plain markdown under the sandboxed roots and is pulled
+  in with `{{ read_file(...) }}`; a persona-switch is an agent-switch — the
+  only system-prompt edit point is the profile.
+- Secrets never appear in TOML; `api_key_ref` resolves through the
+  `SecretsStore` port (QtC keychain in the plugin).
+
+---
+
+## 7. Capabilities layer
+
+**ContextEngine** replaces the monolithic ContextManager with three focused
+services behind IDE-agnostic ports:
+
+| Service | Port (L2-visible) | QtC adapter |
+|---------|-------------------|-------------|
+| `EditorContext` — current doc, selection, prefix/suffix | `IDocumentReader` | TextEditor API |
+| `ProjectContext` — root, file listing, ignore filtering (`.qodeassistignore`), open files, changes | `IProjectScanner` | ProjectExplorer API |
+| `TokenEstimator` — input estimates, calibrated by server usage | — (pure) | — |
+
+**ToolKit** registers the built-in tools (U4) with the
+`ToolContributorRegistry`; each tool declares a permission class (read /
+write / execute) so per-tool enablement (settings) and confirmation policy
+(terminal commands) live in one place.
+
+**SkillsEngine** (U5): discovery + watching of the three skill roots; exposes
+`catalogText()` (names + descriptions for the system prompt),
+`alwaysOnBodies()`, and the `load_skill` tool; the `/` picker injects a
+skill's body into a single message.
+
+**McpHub** (U6): client side connects configured servers and contributes
+their tools through the same registry (tools reach every session uniformly);
+server side exposes ToolKit over HTTP/SSE + stdio bridge.
+
+---
+
+## 8. Cross-cutting policies
+
+Architecture is the rules as much as the boxes. These policies bind every
+layer and are part of the contract:
+
+### 8.1 Threading
+
+The core runs on the GUI thread; concurrency is the Qt event loop plus async
+network I/O — no shared-state threading anywhere in L1–L4. Work that can
+block (project scans, token estimation over large trees) hides behind L3
+ports; an adapter may use worker threads internally but delivers results as
+queued signals. Core types are therefore deliberately not thread-safe.
+
+### 8.2 Request lifecycle
+
+A session has at most one in-flight request; `send()` while in flight cancels
+the previous request first. Every request terminates in exactly one of three
+states — `finished(stopReason)`, `failed(error)`, `cancelled()` — and
+cancellation is *not* an error: no consumer may string-match a message to
+tell them apart.
+
+### 8.3 Errors
+
+Runtime errors are typed, not strings: `ErrorInfo { category, message,
+providerDetail }` with categories `Config | Auth | Network | Provider |
+Validation | Tool`. The category drives UI affordances (Auth → open provider
+settings, Network → offer retry); free text is for logs only. Load-time
+errors (principle 8) surface in the agents settings page, never as a failed
+send.
+
+### 8.4 Timeouts and retries
+
+Transfer timeouts are per-pipeline policy (completion short, chat/refactor
+from settings), applied by the feature — never baked into agent profiles. A
+streaming request is never silently retried after the first byte; automatic
+retry with capped backoff is allowed only for connection-phase failures.
+Anything beyond that is an explicit user action.
+
+### 8.5 Observability
+
+One `RequestID` correlates feature → session → provider → client → events →
+logs. Each layer logs under its own category (`qodeassist.session`,
+`qodeassist.provider`, `qodeassist.tools`, …); request bodies are logged only
+at debug level, and secrets are redacted unconditionally. `Usage` events are
+the single source feeding the token counter, `TokenEstimator` calibration,
+and the performance log.
+
+### 8.6 Config compatibility
+
+Agent profiles carry a `schema_version`; the loader migrates old user
+configs forward or rejects them with an actionable message — silent
+reinterpretation is forbidden. Bundled profiles are read-only resources that
+user profiles shadow by name. Persisted chat history is versioned the same
+way.
+
+### 8.7 Security
+
+Secrets exist only behind the `SecretsStore` port; they never reach TOML,
+logs, or persisted chats. Tool permission classes (read / write / execute)
+centralize the confirmation policy. The MCP server is opt-in and binds
+loopback by default; skill and partial roots are sandboxed — nothing resolves
+outside its declared directory.
+
+### 8.8 Testing
+
+The test pyramid follows the layers:
+
+| Layer | Strategy |
+|-------|----------|
+| L1 | loader/validator unit tests; golden-file snapshots of every bundled profile's rendered body against a synthetic context — the same check as load-time validation, run in CI |
+| L2 | `Session` / `ResponseRouter` replay tests over recorded SSE fixtures per provider; fake `BaseClient`, no network |
+| L3 | contract tests against the ports; QtC adapters covered only by plugin integration |
+
+Layering is enforced mechanically, not by review: each layer is its own
+CMake target, and the core targets do not link Qt Creator — a violating
+include fails the build.
+
+---
+
+## 9. Module / target layout
+
+```
+core/                       # no Qt Creator linkage — tests link this
+  config/                   # L1: ProviderInstance, AgentProfile, loaders,
+                            #     validators, rosters, personas, secrets port
+  providers/                # L2: Provider, GenericProvider, ProviderFactory,
+                            #     ClaudeCacheControl
+  prompt/                   # L2: JsonPromptTemplate, ContextRenderer, partials
+  agents/                   # L2: Agent, AgentFactory, AgentRouter
+  session/                  # L2: Session, SessionManager, ConversationHistory,
+                            #     SystemPromptBuilder, ResponseRouter, events
+  skills/                   # L3 (IDE-free part): SkillsEngine, loaders
+ide/                        # Qt Creator adapters only
+  context/                  # EditorContext, ProjectContext adapters, ignore
+  tools/                    # built-in ToolKit (build, issues, editor edits…)
+  mcp/                      # McpHub managers
+features/
+  completion/               # LSP bridge + CompletionFeature + CodeHandler
+  chat/                     # ChatFeature: ClientInterface, ChatModel(projection),
+                            #   Compressor, TokenCounter, FileEditController,
+                            #   serializer/store
+  refactor/                 # RefactorFeature + custom instructions
+ui/
+  ChatView qml/, widgets/, settings pages
+hosts/
+  plugin/                   # qodeassist.cpp — composition root, actions, panes
+tests/
+  config/                   # loader cases + golden rendered-body snapshots
+  session/                  # SSE replay fixtures per provider, fake client
+external/
+  llmqore/ inja/ tomlplusplus/
+```
+
+Dependency direction is strictly downward in the table of §3; `features/*`
+never include each other; `ui/*` talks only to its feature; `hosts/*` are the
+only places allowed to know about everything.
+
+---
+
+## 10. Deltas from the current working tree
+
+What "from scratch" changes relative to today's code — the migration
+checklist to call the architecture done:
+
+1. **Stack A physical teardown** — delete root `providers/*`,
+   `pluginllmcore/*`, `ConfigurationManager`, legacy provider/model/template
+   settings pages, and the Stack A registration + MCP loop in
+   `qodeassist.cpp`. Runtime already has no consumers.
+2. **Single history owner** — make `ChatModel` a projection of
+   `Session::history()` (subscribe to history signals) instead of a parallel
+   message store with seed-on-send; `ChatCompressor` reads history, not the
+   model.
+3. **Single send path** — delete `Session::sendCompletion(ContextData)`;
+   the completion context becomes user-message content sent through the one
+   `send()` (the completion handler already reads its result from history's
+   last message). Move `QuickRefactorHandler` off raw `BaseClient` signals
+   (`requestCompleted`/`requestFinalized`/`requestFailed`) onto
+   `Session::finished`/`failed` + `history().lastAssistantText()`.
+4. **Three-state request lifecycle** — add `cancelled` to `Session`; today
+   `cancel()` emits `failed(id, "Cancelled by user")` and consumers must
+   string-match to tell cancellation from failure (§8.2).
+5. **Typed errors** — replace `lastError` strings and the `failed(QString)`
+   payload with `ErrorInfo` categories (§8.3).
+6. **Agent selection by pipeline shape** — completion is the only context-routed
+   pipeline (`AgentRouter.pickAgent(roster.codeCompletion, {file, project})`);
+   chat picker filters to the `chatAssistant` allow-list; quick refactor and
+   compression each read a single configured agent (no routing).
+7. **MCP tools on session clients** — register MCP-contributed tools through
+   `ToolContributorRegistry` so chat/refactor sessions get them (today they
+   are registered only on dead Stack A providers).
+8. **Session pooling** — `SessionManager.acquire/release` with a small pool
+   per agent, replacing per-message agent + provider + secrets construction.
+9. **ContextManager split** — extract `EditorContext` / `ProjectContext` /
+   `TokenEstimator` behind ports; move QtC API use into `ide/context`.
+10. **`[body]` model completion** — finish `agent-templates-design.md`
+    (body-table rendering, sandboxed `include`, load-time validation, model
+    override + `${MODEL}`, `schema_version` gate), delete sampling/thinking
+    merge machinery.
+11. **Message type unification** — one `Message`/`ContentBlock` shape from
+    history to QML (roles, text, thinking, tool use/result, images); delete
+    the parallel `ChatModel::Message` struct.
+12. **Test scaffolding** — golden rendered-body snapshots + SSE replay
+    fixtures (§8.8); CI builds the core targets without Qt Creator so a
+    layering violation fails the build.
+13. **Stale docs cleanup** — `project-rules.md` describes the removed Rules
+    system; mark or delete.
--- a/docs/tool-loop-runner-plan.md
+++ b/docs/tool-loop-runner-plan.md
@@ -0,0 +1,192 @@
+# ToolLoopRunner — implementation plan
+
+Status: plan for "variant C" (2026-06-13). Supersedes step 5 of
+`context-architecture.md` §6.
+
+Context that shapes this plan:
+- The tool loop STAYS in LLMQore — the library remains a complete standalone
+  agentic client. Variant C changes its *shape*, not its home: the loop
+  becomes a named class, `BaseClient` slims toward transport.
+- 2026-06-12 the variant-A hook (`setContinuationPayloadBuilder` + Session
+  feeding assembler-built continuation bodies) was implemented and then
+  REVERTED by the project owner: the frozen-replay problem was judged
+  contrived (replay carries the full filtered history of its base payload;
+  mid-loop file changes reach the model via tool results; growth is bounded
+  by `maxToolContinuations`). The reverted llmqore diff is saved at
+  `/tmp/llmqore-continuation-builder.patch`.
+- Therefore this plan has two tracks. **Track 1 (the actual ask): the
+  structural refactoring.** Track 2 (host payload source) is OPTIONAL,
+  parked, and only happens if the 2026-06-12 verdict is explicitly reversed.
+- The context-architecture steps 1–4 implementation (ContextAssembler,
+  content cache, pinned providers, EnvBlockFormatter, ~1200 lines incl.
+  tests) is parked in `stash@{0}` ("new context refactor") on
+  `dev-release-1-0`. It is NOT required for track 1.
+
+---
+
+## 1. Current anatomy (llmqore @ 0348ac8)
+
+- `BaseClient` mixes two responsibilities:
+  - **transport** — HTTP/SSE per request, `ActiveRequest { stream, buffers,
+    url, mode, usage, … }`, accumulation in protocol subclasses;
+  - **loop policy** — `ActiveRequest.originalPayload`,
+    `ActiveRequest.continuationCount`, `m_maxToolContinuations`,
+    `checkContinuationLimit`, `handleToolContinuation`.
+- Loop entry: protocol clients call `executeToolsFromMessage(id)` at their
+  message-end detection points (11 call sites across 7 clients); it forwards
+  `tool_use` blocks to `ToolsManager::executeToolCall`.
+- `BaseClient::tools()` wires `ToolsManager::toolExecutionComplete(id,
+  results)` → `handleToolContinuation`: round-limit check → continuation
+  body via the protocol-virtual `buildContinuationPayload(originalPayload,
+  message, toolResults)` → `finalizeTurn` → `sendRequest(id, storedUrl,
+  payload, storedMode)`.
+
+## 2. Target design
+
+### 2.1 ToolLoopRunner (new, llmqore)
+
+```cpp
+class LLMQORE_EXPORT ToolLoopRunner : public QObject
+{
+    Q_OBJECT
+public:
+    explicit ToolLoopRunner(BaseClient *client);
+
+    int maxRounds() const noexcept;
+    void setMaxRounds(int limit) noexcept;
+
+private:
+    void onToolsCompleted(const RequestID &id,
+                          const QHash<QString, ToolResult> &results);
+    void onRequestClosed(const RequestID &id);
+
+    struct LoopState
+    {
+        int rounds = 0;
+    };
+
+    BaseClient *m_client = nullptr;
+    QHash<RequestID, LoopState> m_loops;
+    int m_maxRounds = 10;
+};
+```
+
+The whole loop policy on one screen:
+
+```cpp
+void ToolLoopRunner::onToolsCompleted(const RequestID &id,
+                                      const QHash<QString, ToolResult> &results)
+{
+    auto &loop = m_loops[id];
+    if (++loop.rounds > m_maxRounds) {
+        m_client->abortRequest(id, "Tool continuation limit reached");
+        m_loops.remove(id);
+        return;
+    }
+
+    const QJsonObject payload = m_client->buildReplayContinuation(id, results);
+    if (payload.isEmpty()) {
+        m_client->abortRequest(id, "Failed to build continuation payload");
+        m_loops.remove(id);
+        return;
+    }
+
+    m_client->continueRequest(id, payload);
+}
+```
+
+- `LoopState` is keyed by request id — several concurrent requests on one
+  client (two chat panels on one provider) never collide.
+- Cleanup: `onRequestClosed` (connected to `requestFailed` +
+  `requestFinalized`) drops the state.
+
+### 2.2 BaseClient becomes transport + tool dispatch
+
+Gains (transport primitives; `continueRequest` public — it is also the seam
+any future host-driven mode would use; failure path via runner friendship):
+
+```cpp
+ToolLoopRunner *toolLoop();                       // owned, created with tools()
+void continueRequest(const RequestID &id, const QJsonObject &payload);
+                                                  // finalizeTurn + resend stored url/mode
+QJsonObject buildReplayContinuation(const RequestID &id,
+                                    const QHash<QString, ToolResult> &results);
+                                                  // originalPayload + protocol virtual
+```
+
+Loses (moves to the runner): `handleToolContinuation`,
+`checkContinuationLimit`, `m_maxToolContinuations`,
+`ActiveRequest::continuationCount`. The `toolExecutionComplete` connection
+in `tools()` retargets to the runner.
+
+Keeps: `executeToolsFromMessage` (the 11 protocol call sites stay
+untouched), the protocol-virtual `buildContinuationPayload` (it IS the
+replay serialization), `originalPayload` storage,
+`setMaxToolContinuations`/`maxToolContinuations` as thin forwarders to
+`toolLoop()` — existing consumers (QodeAssist `ClientInterface`,
+`QuickRefactorHandler`, third parties) compile unchanged.
+
+## 3. Track 1 — structural refactoring (the plan)
+
+Bit-identical behavior throughout; QodeAssist only needs a submodule bump.
+
+**Phase 1 — transport primitives.** Add `continueRequest` +
+`buildReplayContinuation` + public `abortRequest` (now also the body of
+`cancelRequest`). — DONE 2026-06-13.
+
+**Phase 2 — extract the runner.** New `ToolLoopRunner` class; move round
+state + limit; retarget the `toolExecutionComplete` connection; delete
+`handleToolContinuation` / `checkContinuationLimit` /
+`ActiveRequest::continuationCount`; forwarders for
+`setMaxToolContinuations`. — DONE 2026-06-13
+(`include/LLMQore/ToolLoopRunner.hpp`, `source/core/ToolLoopRunner.cpp`,
+`tests/tst_ToolLoopRunner.cpp` — 7 cases: replay flow, round limit, missing
+replay data, two interleaved ids, cleanup on finalize/cancel, forwarders;
+`continueRequest` is virtual as the test seam; llmqore architecture docs
+updated: overview, request-lifecycle diagram, tools).
+
+Deliberate behavior delta (an improvement, worth knowing while testing): an
+empty payload from the protocol's `buildContinuationPayload` now aborts the
+request with "Missing data for tool continuation" instead of silently
+sending an empty body.
+
+**Phase 3 — submodule bump (after the user runs llmqore tests).**
+QodeAssist: bump the submodule pointer, verify live in the plugin (Ollama +
+tools, Claude + tools); update `context-architecture.md`
+§4.3/§6.5 to point here; update project memory.
+
+## 4. Track 2 — host payload source (PARKED)
+
+Only if the 2026-06-12 "проблема надумана" verdict is explicitly reversed.
+Variant C makes it a ~40-line addition, so nothing is lost by parking:
+
+- `ToolLoopRunner::setPayloadSource(id, std::function<QJsonObject(const
+  RequestID &)>)`; registered source is authoritative for its id (empty
+  result → abort, never silent fallback to replay).
+- Host prerequisite: restore the context work from `stash@{0}`
+  (ContextAssembler + `Session::makePayload`); expect conflicts in
+  `Session.cpp` with the newer `dev-release-1-0` refactor commits
+  ("Remove override tools in Session send" etc.).
+- Session registers the source after `provider->sendRequest` (same-thread,
+  race-free; `QPointer` guard).
+- Assembler continuation rules: pinned blocks anchor to the turn's TYPED
+  user message (recorded 2026-06-12), manifest per round.
+
+## 5. Risks
+
+| Risk | Mitigation |
+|---|---|
+| Behavior drift while moving the loop | phases are mechanical; same `buildContinuationPayload` virtuals; llmqore tests + plugin smoke before/after |
+| Two sessions, one client | `LoopState` keyed by request id |
+| Qt 5 compatibility (0348ac8) | runner uses only signals/`QHash`/`std::function` — no Qt 6-only API |
+| Cancel mid-tool-execution | unchanged: `cancelRequest` → `failRequest` → `onRequestClosed` clears state; `ToolsManager::cleanupRequest` handles in-flight tools |
+| Google (model in URL) | `continueRequest` reuses stored per-request url/mode — same as today |
+
+## 6. Deliberately not doing
+
+- Not moving the loop or tool execution out of llmqore
+  (`feedback_llmqore_boundary`).
+- Not touching the 11 `executeToolsFromMessage` call sites or the protocol
+  `buildContinuationPayload` implementations.
+- No Auto/Manual mode flags.
+- Track 2 is not started without an explicit decision.