refactor: Move to agent architecture

This commit is contained in:
Petr Mironychev
2026-05-30 14:50:49 +02:00
parent 34ce787320
commit ccc2ec2e80
364 changed files with 10801 additions and 19020 deletions

View File

@@ -1,174 +0,0 @@
# Agent Roles
Agent Roles allow you to define different AI personas with specialized system prompts for various tasks. Switch between roles instantly in the chat interface to adapt the AI's behavior to your current needs.
## Overview
Agent Roles are reusable system prompt configurations that modify how the AI assistant responds. Instead of manually changing system prompts, you can create roles like "Developer", "Code Reviewer", or "Documentation Writer" and switch between them with a single click.
**Key Features:**
- **Quick Switching**: Change roles from the chat toolbar dropdown
- **Custom Prompts**: Each role has its own specialized system prompt
- **Built-in Roles**: Pre-configured Developer and Code Reviewer roles
- **Persistent**: Roles are saved locally and loaded on startup
- **Extensible**: Create unlimited custom roles for different tasks
## Default Roles
QodeAssist comes with three built-in roles:
### Developer
Experienced Qt/C++ developer with a structured workflow: analyze the problem, propose a solution, wait for approval, then implement. Best for implementation tasks where you want thoughtful, minimal code changes.
### Code Reviewer
Expert C++/QML code reviewer specializing in C++20 and Qt6. Checks for bugs, memory leaks, thread safety, Qt patterns, and production readiness. Provides direct, specific feedback with code examples.
### Researcher
Research-oriented developer who investigates problems and explores solutions. Analyzes problems, presents multiple approaches with trade-offs, and recommends the best option. Does not write implementation code — focuses on helping you make informed decisions.
## Using Agent Roles
### Switching Roles in Chat
1. Open the Chat Assistant (side panel, bottom panel, or popup window)
2. Locate the **Role selector** dropdown in the top toolbar (next to the configuration selector)
3. Select a role from the dropdown
4. The AI will now use the selected role's system prompt
**Note**: Selecting "No Role" uses only the base system prompt without role specialization.
### Viewing Active Role
Click the **Context** button (📋) in the chat toolbar to view:
- Base system prompt
- Current agent role and its system prompt
- Active project rules
## Managing Agent Roles
### Opening the Role Manager
Navigate to: `Qt Creator → Preferences → QodeAssist → Chat Assistant`
Scroll down to the **Agent Roles** section where you can manage all your roles.
### Creating a New Role
1. Click **Add...** button
2. Fill in the role details:
- **Name**: Display name shown in the dropdown (e.g., "Documentation Writer")
- **ID**: Unique identifier for the role file (e.g., "doc_writer")
- **Description**: Brief explanation of the role's purpose
- **System Prompt**: The specialized instructions for this role
3. Click **OK** to save
### Editing a Role
1. Select a role from the list
2. Click **Edit...** or double-click the role
3. Modify the fields as needed
4. Click **OK** to save changes
**Note**: Built-in roles cannot be edited directly. Duplicate them to create a modifiable copy.
### Duplicating a Role
1. Select a role to duplicate
2. Click **Duplicate...**
3. Modify the copy as needed
4. Click **OK** to save as a new role
### Deleting a Role
1. Select a custom role (built-in roles cannot be deleted)
2. Click **Delete**
3. Confirm deletion
## Creating Effective Roles
### System Prompt Tips
- **Be specific**: Clearly define the role's expertise and focus areas
- **Set expectations**: Describe the desired response format and style
- **Include guidelines**: Add specific rules or constraints for responses
- **Use structured prompts**: Break down complex roles into bullet points
## Storage Location
Agent roles are stored as JSON files in:
```
~/.config/QtProject/qtcreator/qodeassist/agent_roles/
```
**On different platforms:**
- **Linux**: `~/.config/QtProject/qtcreator/qodeassist/agent_roles/`
- **macOS**: `~/Library/Application Support/QtProject/Qt Creator/qodeassist/agent_roles/`
- **Windows**: `%APPDATA%\QtProject\qtcreator\qodeassist\agent_roles\`
### File Format
Each role is stored as a JSON file named `{id}.json`:
```json
{
"id": "doc_writer",
"name": "Documentation Writer",
"description": "Technical documentation and code comments",
"systemPrompt": "You are a technical documentation specialist...",
"isBuiltin": false
}
```
### Manual Editing
You can:
- Edit JSON files directly in any text editor
- Copy role files between machines
- Share roles with team members
- Version control your roles
- Click **Open Roles Folder...** to quickly access the directory
## How Roles Work
When a role is selected, the final system prompt is composed as:
```
┌─────────────────────────────────────────────────┐
│ Final System Prompt = Base Prompt + Role Prompt │
├─────────────────────────────────────────────────┤
│ 1. Base System Prompt (from Chat Settings) │
│ 2. Agent Role System Prompt │
│ 3. Project Rules (common/ + chat/) │
│ 4. Linked Files Context │
└─────────────────────────────────────────────────┘
```
This allows roles to augment rather than replace your base configuration.
## Best Practices
1. **Keep roles focused**: Each role should have a clear, specific purpose
2. **Use descriptive names**: Make it easy to identify roles at a glance
3. **Test your prompts**: Verify roles produce the expected behavior
4. **Iterate and improve**: Refine prompts based on AI responses
5. **Share with team**: Export and share useful roles with colleagues
## Troubleshooting
### Role Not Appearing in Dropdown
- Restart Qt Creator after adding roles manually
- Check JSON file format validity
- Verify file is in the correct directory
### Role Behavior Not as Expected
- Review the system prompt for clarity
- Check if base system prompt conflicts with role prompt
- Try a more specific or detailed prompt
## Related Documentation
- [Project Rules](project-rules.md) - Project-specific AI behavior customization
- [Chat Assistant Features](../README.md#chat-assistant) - Overview of chat functionality
- [File Context](file-context.md) - Attaching files to chat context

View File

@@ -0,0 +1,401 @@
# Agent Templates — Design Note (body model, include, extends)
Status: IMPLEMENTED, then partially superseded. The `[body]` table + `extends`
model shipped; the **bundled partials described below were removed** — each wire
base now inlines its message serialization, and bases were split into a
wire-only abstract base (provider + endpoint + serialization) plus a thin
concrete agent that carries all policy (model, persona, tags, caching, thinking,
sampling). `{% include %}` survives only for user-supplied partials. Treat the
partials sections here as historical record; the current user-facing guide is
`creating-agents.md`. Dev-facing (not end-user docs).
Scope: how agent TOML profiles describe the request and share structure.
## Problem this replaces
The shipped model has each agent embed a `[template].message_format` jinja string
that hand-builds the **whole** request body as text, plus `[template.sampling]` and
`[template.thinking.*]` blocks merged in by `applySampling`. Pains:
- Massive copy-paste: 9 OpenAI-compatible agents share a byte-identical ~50-line
`message_format`; 4 Claude agents share another; `role` + README `context` are
identical across 18 files.
- `[template.sampling]` / `[template.thinking.overrides]` /
`[template.thinking.request_block.*]` describe **merge machinery**, not the request
body — they don't look like the actual API call. The `overrides` vs `request_block`
split is meaningless (both are deep-merged into the request identically).
- Manual JSON-by-string-concatenation: trailing-comma bookkeeping
(`{% if not loop.is_last %},{% endif %}`) everywhere; a missing comma fails
silently at runtime (`renderBody` returns nullopt, only a `qWarning`).
- `include` is hard-disabled, so there is no way to share a sub-fragment.
## Agreed model
### 1. `[body]` is a deep-mergeable table = the request body, 1:1 with the API
Replace the `message_format` string and the `sampling`/`thinking` blocks with a
single `[body]` TOML table whose keys are the **literal request-body fields**.
Because it is a table (not a string), `extends` / `deepMerge` can override it
field-by-field — variants become a 2-line delta instead of a copied body.
Field-value rules at build time (per key in `[body]`, applied recursively):
- **string containing jinja** (`{{` or `{%`) → render through inja, splice the
output as **raw JSON** (array / object / string). Empty render → key omitted.
- **string without jinja** (e.g. `"high"`) → literal JSON string, as-is.
- **number / bool / inline-table** → as-is.
So `messages` / `contents` and `system` / `system_instruction` are just **string
fields holding jinja**; everything else (`max_tokens`, `temperature`, `stream`,
`thinking`, `output_config`, `generationConfig`, …) is a literal value that reads
exactly like the curl body.
No runtime toggles: thinking / tools / streaming are **fixed per agent**. A thinking
agent literally carries the `thinking` fields; a non-thinking variant is a separate
file. There is no `{% if thinking %}` in the body. `system` uses
`{% if existsIn(ctx, "system_prompt") %}` only because that is about *presence of
data*, not a mode toggle. `enable_thinking` / `enable_tools` are **capability hints**
(used for UI badges and to decide tool-definition injection) — the body is the source
of truth for what is actually sent, so a thinking agent's body must carry the thinking
fields regardless of the flag.
Outside the body:
- `model` — the TOML `model` is the **default**; a per-agent override chosen in
QodeAssist settings wins. Overrides are stored in `agent_models.json`
(agentName → model) and applied by `AgentFactory` when it builds the agent
(`AgentFactory::effectiveModel`/`setModelOverride`); `Session` still seeds the
payload `model` from the resolved `cfg.model`. URL-model providers (Google) put a
`${MODEL}` placeholder in `endpoint`; `Session` substitutes the resolved model into
the endpoint before sending (same substitution style as `${PROJECT_DIR}`/`${CONFIG_DIR}`),
so the override drives the URL too.
- `tools` — injected by the **provider** when `enable_tools` is set (tool
definitions are dynamic, from `ToolsManager`; they can't be authored in TOML).
- `stream` — always on. Literal `"stream": true` in the body for OpenAI / Claude /
Mistral / Responses / Ollama; encoded in the `endpoint` URL for Google.
### 2. `include` re-enabled as whitelisted partials
The message-array rendering (the complex, comma-heavy part) lives in
`sources/agents/partials/*.jinja`, shared via `{% include %}`. The throwing include
callback is replaced by a sandboxed resolver that:
- rejects names containing `..`, a leading `/`, or a scheme/drive;
- resolves only against known roots: bundled `:/agents/partials/` then the user
`partials/` dir;
- parses/caches the partial in the same `inja::Environment`.
A missing/typo'd partial is a **load-time** error.
### 3. `extends` shares config down a hierarchy
`extends` already exists (`resolveExtends` + `deepMerge` + `abstract`/`hidden`); it
keeps doing what it does, now over the structured `[body]` too. Each API-shape base
carries the default developer persona inline in `system_prompt` (the Roles
subsystem was removed 2026-06-12; see below). No shared root base. Between the
API-shape base and the concrete agents sits one thin abstract base **per provider**
(provider_instance + endpoint only) — the designated extension point for user
agents, so a custom agent is `extends` + `name` + `model`:
```
openai_base (abstract) → system_prompt + [body] (API shape)
├─ mistral_base (abstract) → provider, endpoint (per-provider)
│ ├─ mistral_chat → name, model
│ └─ mistral_reasoning → name, model + enable_thinking
├─ openrouter_base (abstract) ...
└─ openai_chat → name, model (own provider = no mid layer)
anthropic_base (abstract) → system_prompt + provider/endpoint + [body]
└─ claude_sonnet46 → name, model + [body] thinking / output_config
google_base (abstract) → system_prompt + provider + [body]
└─ gemini_chat → endpoint (${MODEL}) + [body.generationConfig] thinkingConfig
```
Bundled agents are read-only: the loader rejects a user file that reuses a bundled
`name`. Customisation = a user agent under a new name extending a bundled base (or a
concrete bundled agent); the per-agent model override in settings covers the
model-only case without any file.
Notes:
- `[body]` is shared whole when identical (the 8 OpenAI-compatible providers); a
variant overrides only the differing field — no duplicated body.
- Arrays (`tags`) are **replaced** on override, not appended (`deepMerge` recurses
objects only). A child that wants base tags + extras restates the full list.
- Division of labour: **include** shares the message-rendering fragment across
unrelated families; **extends** shares config (system_prompt / endpoint / body)
down one inheritance chain.
- With `model` gone, per-model files collapse: agents that previously differed only
by `model` become one agent (the client picks the model). A separate file is only
needed when the body genuinely differs (effort, no-thinking, …).
### System prompt — a composable template with building blocks
The old `role` (static text) and `context` (jinja) layers collapse into one
`agent.system` layer in `Session`, rendered through `ContextRenderer`. The agent's
`system_prompt` field IS that template — the persona is whatever it renders to.
Building blocks:
- `{{ read_file("...") }}` / `file_exists` / `${PROJECT_DIR}` / `${CONFIG_DIR}` — existing
`ContextRenderer` helpers, composable in the same template. Shared persona text
lives in plain markdown under the sandboxed roots (e.g.
`${CONFIG_DIR}/personas/reviewer.md`) and is pulled in with `read_file`.
So a profile can do `system_prompt = """{{ read_file("${CONFIG_DIR}/personas/reviewer.md") }}"""`,
or just inline the text. A persona-switch is an agent-switch (thin `extends` variant).
The former Roles subsystem (`agent_roles/*.json`, `{{ agent_role(id) }}`, the Roles
settings page, the chat role picker) was removed on 2026-06-12 — the chat bases now
inline the developer persona text directly. There is NO per-agent settings override —
the edit point is the profile's `system_prompt`. Code-completion/FIM agents set no
`system_prompt`.
## Worked examples
OpenAI base:
```toml
abstract = true
system_prompt = """<inline developer persona text>"""
provider_instance = "OpenAI (Chat Completions)"
endpoint = "/chat/completions"
enable_tools = true
[body]
max_tokens = 8192
temperature = 0.7
stream = true
messages = """
[ {% include "partials/openai_messages.jinja" %} ]
"""
```
Mistral reasoning child (delta only):
```toml
extends = "OpenAI Base Chat"
name = "Mistral Reasoning Chat"
provider_instance = "Mistral AI"
endpoint = "/v1/chat/completions"
enable_thinking = true
[body]
reasoning_effort = "medium"
```
Claude base (literally the curl body):
```toml
abstract = true
system_prompt = """<inline developer persona text>"""
provider_instance = "Claude"
endpoint = "/v1/messages"
enable_thinking = true
enable_tools = true
[body]
max_tokens = 16000
temperature = 1
stream = true
thinking = { type = "adaptive", display = "summarized" }
output_config = { effort = "high" }
system = """{% if existsIn(ctx, "system_prompt") %}{{ tojson(ctx.system_prompt) }}{% endif %}"""
messages = """
[ {% include "partials/anthropic_messages.jinja" %} ]
"""
```
Sonnet child (delta only):
```toml
extends = "Anthropic Base Chat"
name = "Claude Sonnet"
[body.output_config]
effort = "medium"
```
Google base (`${MODEL}` in endpoint; streaming in the URL):
```toml
abstract = true
system_prompt = """<inline developer persona text>"""
provider_instance = "Google AI"
endpoint = "/models/${MODEL}:streamGenerateContent?alt=sse"
enable_thinking = true
enable_tools = true
[body]
system_instruction = """{% if existsIn(ctx, "system_prompt") %}{ "parts": [ { "text": {{ tojson(ctx.system_prompt) }} } ] }{% endif %}"""
contents = """
[ {% include "partials/google_contents.jinja" %} ]
"""
[body.generationConfig]
maxOutputTokens = 16000
temperature = 1
thinkingConfig = { includeThoughts = true, thinkingBudget = 8192 }
```
### Partials
`partials/openai_messages.jinja` dispatches per message:
```jinja
{% if existsIn(ctx, "system_prompt") %}
{ "role": "system", "content": {{ tojson(ctx.system_prompt) }} },
{% endif %}
{% for msg in ctx.history %}
{% if msg.role == "assistant" %}{% include "partials/openai_assistant.jinja" %}
{% else if length(filter_by_type(msg.content_blocks, "tool_result")) > 0 %}{% include "partials/openai_tool_results.jinja" %}
{% else %}{% include "partials/openai_user.jinja" %}
{% endif %}
{% endfor %}
```
`partials/openai_assistant.jinja`:
```jinja
{% set tcalls = filter_by_type(msg.content_blocks, "tool_use") %}
{
"role": "assistant",
"content": {{ tojson(msg.content) }}
{% if length(tcalls) > 0 %}
, "tool_calls": [
{% for b in tcalls %}
{ "id": {{ tojson(b.id) }}, "type": "function",
"function": { "name": {{ tojson(b.name) }}, "arguments": {{ tojson(tojson(b.input)) }} } },
{% endfor %}
]
{% endif %}
},
```
`partials/openai_tool_results.jinja`:
```jinja
{% for b in filter_by_type(msg.content_blocks, "tool_result") %}
{ "role": "tool", "tool_call_id": {{ tojson(b.tool_use_id) }}, "content": {{ tojson(b.content) }} },
{% endfor %}
```
`partials/openai_user.jinja`:
```jinja
{% if existsIn(msg, "images") %}
{ "role": "user", "content": {% include "partials/openai_image_content.jinja" %} },
{% else %}
{ "role": "user", "content": {{ tojson(msg.content) }} },
{% endif %}
```
`partials/openai_image_content.jinja`:
```jinja
[
{ "type": "text", "text": {{ tojson(msg.content) }} }
{% for img in msg.images %}
,
{% if img.is_url %}
{ "type": "image_url", "image_url": { "url": {{ tojson(img.data) }} } }
{% else %}
{ "type": "image_url", "image_url": { "url": "data:{{ img.media_type }};base64,{{ img.data }}" } }
{% endif %}
{% endfor %}
]
```
`partials/anthropic_messages.jinja`:
```jinja
{% for msg in ctx.history %}
{
"role": {{ tojson(msg.role) }},
"content": [
{% for b in msg.content_blocks %}
{% if b.type == "image" %}{% include "partials/anthropic_image.jinja" %}
{% else %}{{ tojson(b) }},
{% endif %}
{% endfor %}
]
},
{% endfor %}
```
`partials/anthropic_image.jinja`:
```jinja
{
"type": "image",
"source":
{% if b.is_url %}
{ "type": "url", "url": {{ tojson(b.data) }} }
{% else %}
{ "type": "base64", "media_type": {{ tojson(b.media_type) }}, "data": {{ tojson(b.data) }} }
{% endif %}
},
```
`partials/google_contents.jinja`:
```jinja
{% for msg in ctx.history %}
{
"role": {% if msg.role == "assistant" %}"model"{% else %}"user"{% endif %},
"parts": [ {% for b in msg.content_blocks %}{% include "partials/google_part.jinja" %}{% endfor %} ]
},
{% endfor %}
```
`partials/google_part.jinja`:
```jinja
{% if b.type == "text" %}
{ "text": {{ tojson(b.text) }} },
{% else if b.type == "thinking" %}
{ "text": {{ tojson(b.thinking) }}, "thought": true, "thoughtSignature": {{ tojson(b.signature) }} },
{% else if b.type == "tool_use" %}
{ "functionCall": { "name": {{ tojson(b.name) }}, "args": {{ tojson(b.input) }} } },
{% else if b.type == "tool_result" %}
{ "functionResponse": { "name": {{ tojson(b.name) }}, "response": { "result": {{ tojson(b.content) }} } } },
{% else if b.type == "image" %}
{% if b.is_url %}
{ "file_data": { "mime_type": {{ tojson(b.media_type) }}, "file_uri": {{ tojson(b.data) }} } },
{% else %}
{ "inline_data": { "mime_type": {{ tojson(b.media_type) }}, "data": {{ tojson(b.data) }} } },
{% endif %}
{% else %}
{ "text": "" },
{% endif %}
```
## C++ work
In `JsonPromptTemplate`:
- Parse `[body]` as a `QJsonObject` (not a string). Walk it recursively and build the
request: render jinja-bearing string values via inja and splice the parsed JSON;
pass literal strings / scalars / inline-tables through; drop keys whose render is
empty.
- **Delete** `m_sampling`, `m_thinking`, and `applySampling` entirely — the body is
the request; there is no separate sampling/thinking merge.
- Drop the `thinkingEnabled` parameter from `buildFullRequest` /
`Provider::prepareRequest` / `Session` — it no longer affects rendering.
- Add a **JSON-aware** trailing-comma stripper before `QJsonDocument::fromJson`
(tracks string/escape state so `,}` / `,]` inside string values are not touched).
This is what lets partials emit an unconditional `,` after every element and drop
all `loop.is_last` bookkeeping.
In `AgentConfig` / `AgentLoader`:
- Replace `messageFormat` (string) with `body` (`QJsonObject`); merge `role` +
`context` into `system_prompt`. `[template].sampling` / `[template].thinking` are
removed.
- `extends` / `deepMerge` are unchanged; they now also merge `[body]`.
- Validate at load: a referenced partial must resolve; the assembled body must parse
as JSON (render once against a synthetic context with tool_use / tool_result /
image). Catches breakage at startup, not mid-conversation.
Model selection (per-agent override):
- `AgentFactory` owns an agentName → model map loaded from `agent_models.json`
(`loadModelOverrides`/`saveModelOverrides`). `create()`/`createFromFile()` apply the
override into the built `AgentConfig`; `effectiveModel()` exposes the resolved value;
`setModelOverride()` persists. The settings UI (`AgentDetailPane`) edits it via an
editable Model field; list/roster widgets display `effectiveModel`.
- `Session` substitutes `${MODEL}` in `cfg.endpoint` with the resolved model before
`sendRequest` (covers Google, whose model lives in the URL), and still seeds the
payload `model` from `cfg.model`. The provider keeps injecting `tools` when
`enable_tools` is set.
In `Session`:
- Collapse the `agent.role` + `agent.context` system-prompt layers into one rendered
`system_prompt` layer.
## Implementation order
1. JSON-aware trailing-comma stripper + whitelisted `include` resolver (enables
readable partials).
2. `[body]`-table model in `JsonPromptTemplate` + loader; delete
sampling/thinking/`applySampling`; drop `thinkingEnabled`.
3. `system_prompt` merge in loader + `Session`.
4. Per-agent model override in `AgentFactory` (`agent_models.json`) + `${MODEL}`
endpoint substitution in `Session`; editable Model field in settings; convert
bundled agents to the base/partials/`extends` layout.
5. Load-time validation (partial resolves, body parses).

321
docs/architecture.md Normal file
View File

@@ -0,0 +1,321 @@
# QodeAssist Architecture
This document describes the **current** runtime architecture, after the §10
rework in `target-architecture.md` was completed. Every runtime LLM path —
code completion, chat (send/stream + compression + token counting), and quick
refactor — flows through one stack: agents, `Session`, and the
`Providers::GenericProvider` layer. There is no legacy parallel path; the old
"Stack A" (root `providers/*`, `pluginllmcore/*`, `ConfigurationManager`, the
provider/model/template settings pages) has been removed.
For the design rationale, layering contract, and cross-cutting policies, see
[`target-architecture.md`](target-architecture.md). This file documents how the
code is wired today.
---
## 1. Top level: ownership and dependency injection
The plugin (`qodeassist.cpp`) owns everything via `new` + parent — no
plugin-wide singletons; each feature receives its dependencies explicitly.
```
QodeAssistPlugin
• Providers::registerBuiltinProviders() — client_api → provider table
• ProviderInstanceFactory — provider instances from TOML
• ProviderSecretsStore — secrets behind a port
• AgentFactory — agents from TOML + agent_models.json
• SessionManager(agentFactory) — owns the ToolContributorRegistry
toolContributors().add(registerQodeAssistTools)
toolContributors().add(registerSkillTool)
toolContributors().add(McpClientsManager::registerToolsOn)
• m_engine (QQmlEngine)
rootContext: "agentFactory", "sessionManager" — DI for chat (QML)
Wired into consumers:
• QodeAssistClient ← LLMClientInterface(generalSettings, completeSettings,
agentFactory, sessionManager, documentReader,
performanceLogger)
← setSessionManager / setAgentFactory (quick refactor)
```
Chat lives in QML (`ChatRootView` is a `QML_ELEMENT`), so `AgentFactory` and
`SessionManager` are exposed as **context properties on the engine's root
context** and resolved in `ChatRootView` via
`qmlEngine(this)->rootContext()->contextProperty(...)`.
---
## 2. Core (agent / Session)
```
AgentFactory.create(name)
configByName(name) → AgentConfig (TOML, [body] table; model override from
agent_models.json applied here)
buildProviderForAgent:
instance = ProviderInstanceFactory.instanceByName(cfg.providerInstance)
provider = ProviderFactory::create(instance.clientApi)
provider.setUrl(instance.url)
provider.setApiKey(secrets.read(instance.apiKeyRef))
Agent(config, provider)
promptTemplate = JsonPromptTemplate::fromConfig(cfg) — compiles [body] (inja),
validated at load against a synthetic context
provider.setPromptCaching(cfg.cachePrompt, cfg.cacheTtl == "1h")
SessionManager — two ways to obtain a Session:
• createSession(agentName, externalHistory?) — chat: attaches a persistent,
externally-owned history
• acquire(agentName) / release(session) — one-shot pipelines: a small
per-agent pool of internal-history
sessions; acquire hands out a
session with cleared history,
cleared system-prompt layers and
cleared client tools
Session(agent[, externalHistory])
├─ ConversationHistory — messages as polymorphic ContentBlocks
├─ SystemPromptBuilder — ordered named layers (priority-sorted)
└─ ResponseRouter(client) — adapts client signals → typed ResponseEvent
Session API:
• send(blocks) — the ONLY dispatch entry point: append a user
message and dispatch. Completion/chat/refactor
differ only in block content + template; tools
on/off comes from the agent's enable_tools.
• cancel() — tears down in-flight; emits cancelled(id)
• history() / systemPrompt() / client()
• setContentLoader(loader) — resolves Stored* attachment/image blocks
• lastError() → ErrorInfo — typed synchronous start-failure detail
Session signals (three-state, mutually exclusive per request):
• finished(id, stopReason)
• failed(id, ErrorInfo{category, message, providerDetail})
• cancelled(id)
+ event(ResponseEvent) — live delta stream for the chat UI
```
`Session::dispatch` renders the agent's `system_prompt` into the `agent.system`
layer, composes all `SystemPromptBuilder` layers into the request system prompt,
and substitutes `${MODEL}` in the endpoint before sending.
---
## 3. Provider layer
One configuration-driven `GenericProvider` covers every API; it varies only by
the LLMQore client factory and metadata. Request *shape* belongs to the agent's
`JsonPromptTemplate` (the `[body]` table), never to the provider.
```
ProviderFactory (sources/providers, namespace functions)
registerType(name, fn) / create(name, parent) / knownNames()
▲ registerBuiltinProviders() — client_api → provider table
GenericProvider : Providers::Provider
• owns an LLMQore::BaseClient (created by a ClientFactory)
• prepareRequest → PromptTemplate::buildFullRequest; injects tools when
enable_tools; applies ClaudeCacheControl when prompt caching is on
• client() / providerID() / getInstalledModels()
```
### client_api → provider table
| client_api | LLMQore client | ProviderID |
|------------------------------|-----------------------|------------------|
| Claude | ClaudeClient | Claude |
| Google AI | GoogleAIClient | GoogleAI |
| llama.cpp | LlamaCppClient | LlamaCpp |
| Mistral AI | MistralClient | MistralAI |
| Codestral | MistralClient | MistralAI |
| Ollama (Native) | OllamaClient | Ollama |
| Ollama (OpenAI-compatible) | OpenAIClient | OpenAICompatible |
| OpenAI (Chat Completions) | OpenAIClient | OpenAI |
| OpenAI (Responses API) | OpenAIResponsesClient | OpenAIResponses |
| OpenAI Compatible | OpenAIClient | OpenAICompatible |
| OpenRouter | OpenAIClient | OpenRouter |
| LM Studio (Chat Completions) | OpenAIClient | LMStudio |
| LM Studio (Responses API) | OpenAIResponsesClient | OpenAIResponses |
---
## 4. Configuration model
```
~/.config/.../qodeassist/config/
providers/*.toml → ProviderInstance { name, client_api, url, api_key_ref }
agents/*.toml → AgentConfig { schema_version, providerInstance, model,
endpoint, system_prompt, [body], match,
enable_tools, enable_thinking, cache_prompt,
extends, abstract, hidden, tags }
agent_models.json → per-agent model override (applied by AgentFactory)
pipelines → codeCompletion (ordered roster, routed by AgentRouter.pickAgent
on {filePath, projectName}); chatAssistant (allow-list for the
chat picker); chatCompression / quickRefactor (single agent each)
Editor policy (NOT agent config):
CodeCompletionSettings — triggers, modelOutputHandler, context extraction,
useOpenFilesContext
```
`[body]` **is** the request body (deep-mergeable through `extends`; Jinja-bearing
string values render and splice as raw JSON, literals pass through, empty renders
drop the key). `include` resolves only sandboxed partial roots. Profiles validate
at load: a referenced partial must resolve and the assembled body must parse as
JSON against a synthetic context — config errors surface in the agents settings
page, never as a silent runtime drop. The loader also lints: unknown top-level /
`[match]` keys and same-layer duplicate names are warnings; a user file that
reuses a bundled agent's name is rejected (bundled agents cannot be replaced —
users extend them, or the per-provider abstract bases, under a new name);
`abstract` and `hidden` are never inherited through `extends`. Full spec:
[`agent-templates-design.md`](agent-templates-design.md); user-facing guide:
[`creating-agents.md`](creating-agents.md).
---
## 5. Runtime paths
Agent selection depends on the pipeline. Code completion is the only
context-routed one: `AgentRouter.pickAgent(roster.codeCompletion, {file,
project})` walks the ordered roster and returns the first agent whose `[match]`
fits. Chat filters to the `chatAssistant` allow-list and the user picks; quick
refactor and compression each use a single configured agent.
### 5a. Code completion
```
Qt Creator LSP (getCompletionsCycling)
LLMClientInterface
agent = AgentRouter.pickAgent(roster.codeCompletion, {file, project})
session = sessionManager.acquire(agent) — pooled
systemPrompt layer "completion.context" = fileContext + open-files context
session.send( blocks{ CompletionContent(prefix, suffix) } )
▼ on Session::finished:
history().lastAssistantText() → CodeHandler (output-mode) → LSP items
→ sessionManager.release(session)
```
The completion context travels as a `CompletionContent` block; the template
exposes it as `ctx.prefix` / `ctx.suffix`. FIM vs instruct is purely agent
config (the body), not feature code. Completion never touches the delta stream —
it waits for `finished` and reads the last message.
### 5b. Chat
`ChatRootView` owns one persistent `ConversationHistory` for the whole chat view
and injects it into every collaborator. **History is the single source of truth.**
```
ChatRootView (QML) — owns ConversationHistory m_history
ChatModel.setHistory(m_history) — ChatModel is a PROJECTION:
subscribes to messageAdded/Updated/cleared/reset, flattens blocks→rows,
overlays file-edit status from ChangesManager, holds a per-message usage map
ChatAgentController — picker filtered to the
chatAssistant allow-list; active agent persisted
▼ dispatchSend
ClientInterface
session = sessionManager.createSession(activeAgent, m_history)
sessionManager.toolContributors().contribute(client.tools()) — builtin+skills+MCP
session.setContentLoader(ChatSerializer::loadContentFromStorage)
systemPrompt layer "chat.context" = project info + skills + linked files
session.send( blocks{ TextContent + StoredAttachmentContent + StoredImageContent } )
▼ consumes Session signals (NOT raw client signals):
event(Usage) → ChatModel.setMessageUsage + token-counter calibration
finished(id) → ChangesManager.applyPendingEditsForRequest + persist;
removeSession (the persistent history survives)
failed(id, ErrorInfo) → surface error; removeSession
ChatCompressor → acquire(chatCompression agent — single configured) → seed history
from the chat's messages → "compression" layer → send → read summary
from the compression session's own history → release
InputTokenCounter → estimates over ConversationHistory (calibrated by Usage events)
ChatSerializer → persists ConversationHistory via MessageSerializer (v0.3);
imports legacy v0.1/v0.2 files
```
`ChatModel`'s QML role surface (roleType / content / attachments / images /
isRedacted / token roles) is unchanged, so the QML delegates were untouched. The
projection's incremental updates avoid model resets on the streaming hot path.
### 5c. Quick refactor
```
QodeAssistClient.requestQuickRefactor → QuickRefactorHandler
agent = pipelines.quickRefactor (single configured agent)
session = sessionManager.acquire(agent)
if useTools: sessionManager.toolContributors().contribute(client.tools())
systemPrompt layer "refactor" = tagged selection + output + indentation rules
session.send(blocks{instructions})
▼ on Session::finished:
history().lastAssistantText() → ResponseCleaner → RefactorResult → editor insert
→ sessionManager.release(session)
on Session::failed(ErrorInfo) → RefactorResult{error}
```
---
## 6. Context layer
The context services sit behind IDE-agnostic ports; Qt Creator API use lives in
the adapters.
```
EditorContext — IDocumentReader (port) ← DocumentReaderQtCreator (TextEditor API)
ProjectContext — IProjectScanner (port) ← ProjectScannerQtCreator (ProjectExplorer
+ Core::DocumentModel + the IgnoreManager for .qodeassistignore)
TokenEstimator — TokenUtils (pure) ← InputTokenCounter (thin UI consumer)
```
`ContextManager` is now Qt-Creator-free: it delegates open-file enumeration and
ignore filtering to an injected `IProjectScanner` (defaulting to the QtC adapter),
and keeps only filesystem reads + formatting. `ContextManager::shouldIgnore(path)`
replaced the previously exposed `ignoreManager()`.
---
## 7. Cross-cutting
- **Request lifecycle** — a session has at most one in-flight request; `send()`
while in flight cancels the previous. Every request ends in exactly one of
`finished` / `failed` / `cancelled`. Cancellation is not an error; no consumer
string-matches a message to tell them apart.
- **Typed errors** — `ErrorInfo { category ∈ {Config, Auth, Network, Provider,
Validation, Tool}, message, providerDetail }`. `ResponseRouter` categorizes wire
errors (best-effort) at the boundary; `Session::failed` carries the typed value.
- **Tools** — `SessionManager` owns a `ToolContributorRegistry`; built-in ToolKit,
the skill tool, and MCP client tools register once and are contributed to chat
and quick-refactor session clients uniformly.
- **Threading** — the core runs on the GUI thread; concurrency is the Qt event
loop plus async network I/O. Blocking work hides behind L3 ports.
---
## 8. Tests
`test/` (GTest + Qt::Test) covers the two engines most affected by the rework:
- `JsonPromptTemplateTest` — the `[body]` engine: jinja render + JSON splice,
literal passthrough, empty-render key drop, nested literals, and load-time
rejection of bodies that render invalid JSON.
- `ResponseRouterTest` — a fake `BaseClient` replays a recorded provider stream;
asserts the assistant message is stamped with the request id, history is built
correctly (thinking + text + tool use/result), the typed event stream is emitted,
and wire errors are categorized.
- `BundledAgentsTest` — loads every bundled agent through the real loader (extends
+ partials resolved from the qrc) and renders each `[body]` against the synthetic
validation context. This is the load-time validation guarantee run in CI: a broken
bundled body, partial, or `extends` chain fails the test instead of surfacing as a
silent runtime drop.
---
## 9. Remaining follow-ups (optional)
1. **Qt-Creator-free core build + CI** — `AgentFactory` / `ContextRenderer` still
call `Core::ICore::userResourcePath`, so the core targets link `QtCreator::Core`.
A `ResourcePaths` port + adapter would let the core build without Qt Creator and
enable a CI job that fails on a layering-violating include. (The bundled-agent
render check already runs in the QtC-linked test binary — see §8.)
2. **§9 target module layout** — the `core/ ide/ features/ hosts/` physical target
split in `target-architecture.md` is not yet reflected in the directory layout.
```

View File

@@ -110,6 +110,4 @@ No additional configuration is required.
## Related Documentation
- [Agent Roles](agent-roles.md) - Switch between AI personas
- [File Context](file-context.md) - Attach files to chat
- [Project Rules](project-rules.md) - Customize AI behavior

View File

@@ -0,0 +1,347 @@
# QodeAssist — Context Architecture (v1.0)
Status: design proposal, extends `target-architecture.md` (§7 ContextEngine,
delta #9) and `agent-templates-design.md` (the `ctx.*` template contract).
Scope: everything between "facts exist in the IDE / on disk / in the
conversation" and "bytes leave in the request body" — what context each
pipeline needs, who acquires it, where it lands in the prompt. One assembly
runs per `send()`; tool continuations stay inside LLMQore (§4.3).
---
## 1. Taxonomy — the five kinds of context
Every piece of context the model ever sees falls into one of five categories.
The categories differ in *acquisition mode*, *volatility*, and therefore
*placement* — conflating them is the root cause of today's problems (§3).
| # | Category | What it answers | Examples | Volatility |
|---|----------|-----------------|----------|------------|
| C1 | **Identity** | who is the assistant | agent `system_prompt` (persona inline or via `read_file()`), always-on skills, skills catalog | per agent change |
| C2 | **Environment** | where is it working | project name + source root, build dir, language/file info, recent changes | per project / slow |
| C3 | **Task** | what is asked *now* | chat message, attachments, images, invoked-skill body, completion prefix/suffix, refactor selection + instruction | every turn |
| C4 | **Conversation** | what happened so far | history (text, thinking, tool use/results), compression summary | grows every turn |
| C5 | **Pulled** | what the model asked for | tool results (read file, search, build, diagnostics), MCP tool results | inside the turn |
Two acquisition modes cut across the categories:
- **Push** — we inject proactively (C1C3, C4). Push is a *per-pipeline
policy*: completion must push everything (no latency budget for tools);
chat should push little and let the model pull.
- **Pull** — the model requests through tools (C5). Pull needs no assembly
policy at all, but its *results* become C4 and therefore must flow through
the same budget and serialization rules as everything else.
One more orthogonal property drives placement: **stability**. Provider prompt
caches (Claude `cache_control`) reward byte-stable prefixes. Stable content
belongs early (system), volatile content belongs late (near the last user
message). This single rule decides almost every placement question below.
---
## 2. Context inventory per pipeline
What each use case (numbering from `target-architecture.md` §1) actually
needs, against the taxonomy:
| Context item | Cat | U1 completion | U2 chat | U3 refactor | compression | Source port |
|---|---|---|---|---|---|---|
| agent `system_prompt` (persona) | C1 | ✓ | ✓ (persona switch = agent switch) | ✓ | ✓ | AgentProfile + ContextRenderer |
| skills catalog + always-on | C1 | — | ✓ | — | — | SkillsEngine |
| project root / build dir | C2 | — | ✓ | — | — | `IProjectScanner` |
| language + file info | C2 | ✓ | — | ✓ | — | `IDocumentReader` |
| recent project changes | C2 | optional (setting) | — | optional | — | ChangesManager |
| prefix / suffix (FIM) | C3 | ✓ | — | — | — | `IDocumentReader` |
| selection + position markers | C3 | — | — | ✓ | — | `IDocumentReader` |
| user message text | C3 | — | ✓ | ✓ (instruction) | ✓ (directive) | UI |
| attachments / images | C3 | — | ✓ | — | — | chat storage (loader) |
| invoked skill body (`/cmd`) | C3 | — | ✓ | — | — | SkillsEngine |
| linked files (pinned) | C3/C2 | — | ✓ | — | — | `IProjectScanner` + fs |
| open-files sync | C3/C2 | — | ✓ | — | — | `IProjectScanner` |
| history | C4 | — (fresh session) | ✓ | — (fresh) | ✓ (read-only input) | ConversationHistory |
| tool results | C5 | — | ✓ | ✓ (optional) | — | ToolsManager / McpHub |
---
## 3. Problems in the current code this design removes
1. ~~**Two assembly paths.**~~ — RECLASSIFIED 2026-06-12 as by-design, not a
problem: the first request renders from `ConversationHistory`; tool
continuations are LLMQore's replay of that payload plus appended tool
results. The replay carries the full filtered history of its base payload,
so the feared filter divergence does not materialize in practice (§4.3).
2. **No budget.** History is never trimmed, estimated, or compacted; every
send ships everything, forever.
3. **Volatile content in system.** Linked-file contents live in the
`chat.context` system layer; any file edit between turns invalidates the
provider prompt cache for the whole request.
4. **Invoked skills evaporate.** A `/skill` body is injected into the system
layer for one send only — the next turn the model has lost the skill's
instructions, although the conversation continues to rely on them.
5. **Silent loss.** A failed attachment load drops the block with no trace —
neither the model nor the user learns the image is gone.
6. **Repeated materialization.** Every send re-reads and re-base64s every
stored image/attachment of the whole history from disk.
7. **Placement decided ad hoc.** Each feature hand-formats markdown and picks
a system layer by habit (`completion.context`, `refactor`, `chat.context`);
there is no shared rule for what goes where, and the project-info block is
formatted three different ways.
---
## 4. Architecture — Acquire → Assemble → Shape
Three stages with hard ownership boundaries:
```mermaid
flowchart LR
subgraph L3["Acquire — ContextEngine (L3, ports + QtC adapters)"]
EC["EditorContext<br/>prefix/suffix, selection,<br/>language, copyright strip"]
PC["ProjectContext<br/>root, ignore filter,<br/>open files, changes"]
TE["TokenEstimator<br/>calibrated by Usage"]
end
subgraph L4["Features (L4) — decide WHAT"]
F["chat / completion / refactor<br/>set layers, pin providers,<br/>build user blocks"]
end
subgraph L2["Assemble — Session (L2) — decide WHERE & HOW MUCH"]
SPB["SystemPromptBuilder<br/>stable layers only"]
PIN["Pinned providers<br/>re-materialized every dispatch"]
CA["ContextAssembler<br/>history + layers + pinned<br/>+ loader + budget → ctx"]
end
subgraph L1["Shape — JsonPromptTemplate (L1/L2)"]
TPL["[body] jinja over ctx.*"]
end
EC --> F
PC --> F
F --> SPB
F --> PIN
F --> CA
SPB --> CA
PIN --> CA
TE --> CA
CA --> TPL
```
- **Acquire (L3)** — `ContextEngine` services behind IDE-agnostic ports read
facts from the IDE/fs. No prompt text, no placement decisions. One shared
`EnvBlockFormatter` renders the project/file info block so it is identical
in every pipeline.
- **Features (L4)** decide *what* context a turn needs: they set their system
layer, pin refreshable providers, and compose user blocks. They never
decide request shape and never concatenate history.
- **Assemble (L2)** — `ContextAssembler` (successor of
`Session::toLegacyContext`) is the **only** producer of the template
context, once per `send()` dispatch; tool continuations replay that payload
inside LLMQore (§4.3). It owns placement policy, budget enforcement,
materialization, and the manifest.
- **Shape (L1)** — the agent's `[body]` table renders `ctx.*` into the wire
request. Templates own *shape per provider*, never content.
### 4.1 The three injection mechanisms
| Mechanism | For | Lifetime | Refresh | Persisted |
|---|---|---|---|---|
| **System layers** (`SystemPromptBuilder`) | stable C1/C2: `agent.system`, `env.project`, `skills.catalog`, `refactor`, `compression` | conversation | on send | no |
| **Pinned providers** (new) | refreshable C3/C2: linked files, open-files sync | until unpinned | **every `send()`** | as reference only |
| **User blocks** (`send(blocks)`) | one-shot C3: message, attachments, images, invoked-skill body, completion content | that turn | never (history is immutable) | yes |
Pinned providers are the new piece:
```
session->pinContext(id, [](){ return materialized blocks; });
session->unpinContext(id);
```
The assembler calls every pinned provider at **every `send()`** and splices
the result as text blocks
**prepended to the turn's typed user message** — the last user-role wire
message that does not carry tool results (falling back to the tool-result
carrier, after its leading `tool_result` blocks, and to a synthetic user
message when the history has no user message at all). Prepending into an
existing message rather than inserting a separate one keeps strict
user/assistant alternation, which some provider APIs enforce.
The fixed anchor and the per-turn refresh split the cache cost fairly:
within a turn's tool loop the pinned blocks are byte-identical (continuations
replay the payload — pure appends, cache hits); the next `send()` re-reads
the files, and a change invalidates the cache only from the turn's anchor,
not from the system prefix. The materialized block's label states its capture
time ("content as of this turn") because a tool may mutate the file mid-loop;
the model sees such changes through the tool results themselves. Pinned
content is never stored in history and never persisted — never duplicated
turn-over-turn.
Invoked-skill bodies move the opposite way: out of the system layer into the
**user blocks of that turn** (a dedicated block type), so they persist in
history and survive the rest of the conversation (fixes problem 4).
### 4.2 Placement policy (single table, owned by the assembler)
| Content | Position in request | Why |
|---|---|---|
| `agent.system` (rendered TOML `system_prompt`) | system, first | static per agent → max cache reuse |
| `env.project`, `skills.catalog` | system, after agent | changes rarely |
| pipeline layers (`refactor`, `compression`, `completion.context`) | system, last | fresh session each time, ordering irrelevant |
| history | messages | as is |
| pinned materializations | text blocks prepended to the turn's typed user message, live content | fixed anchor keeps the prefix cache-stable; content refreshes because tools mutate files at any moment |
| task blocks | last user message | the turn itself |
`ClaudeCacheControl` breakpoints stay as they are (system / history tail);
this ordering is what makes them effective.
### 4.3 Tool continuations stay in LLMQore (replay)
The tool loop deliberately stays in LLMQore — the library is a complete,
standalone agentic client, and the loop (execute tools, count rounds,
schedule the next request, stream) is *mechanism*, which per
`target-architecture.md` design principle 3 belongs in C++ identically for
all providers. Continuation *content* is the library's default replay: the
base payload plus the assistant message and appended tool results.
An inversion hook (`setContinuationPayloadBuilder`, an optional per-request
callback letting `Session` re-assemble each continuation through
`ContextAssembler`) was implemented and **reverted 2026-06-12**: the problem
it solved was judged contrived. The replay already carries the full filtered
history of its base payload, mid-loop file changes reach the model through
the tool results themselves, and continuation growth within one turn is
bounded by `maxToolContinuations` — budget enforcement at `send()` time
covers the realistic cases. Consequences accepted with the revert: the
manifest logs one entry per `send()` (not per wire request), and pinned
content is byte-stable for the duration of a turn's tool loop (§4.1).
2026-06-13 the loop's *shape* inside LLMQore was refactored without changing
this decision (see `tool-loop-runner-plan.md`): the loop policy now lives in
`ToolLoopRunner` (per-request round state, limit, continuation decision) and
`BaseClient` slimmed to transport + tool dispatch with public primitives
`continueRequest` / `buildReplayContinuation` / `abortRequest`. Continuation
content is still the replay. QodeAssist sets the round limit via
`client->toolLoop()->setMaxRounds(...)`; the old `setMaxToolContinuations`
stays as a forwarder for compatibility.
### 4.4 Budget
`ContextAssembler` consults a `BudgetPolicy` before producing the context:
```
input_estimate = TokenEstimator(system + history + pinned + task)
limit = agent context_window body.max_tokens (output reserve)
```
`context_window` comes from provider/model metadata with an optional agent
TOML override. When the estimate exceeds the limit the policy returns a trim
plan executed in deterministic order:
1. elide bodies of tool results older than the last N rounds
(`[tool result elided — N tokens]` placeholder, pairing preserved);
2. elide materializations of old stored images/attachments (placeholder
block, reference kept in history);
3. below a hard floor — refuse with `ErrorCategory::Validation` and surface
"compress the conversation" (ChatCompressor) in the UI.
v1.0 ships stages: **estimate + manifest + UI warning** first (no silent
trimming), then stage 12 elision, then auto-compression hooks. The
architecture fixes the *seam*; the policy can stay minimal.
`TokenEstimator` is calibrated per provider/model from `Usage` events
(§8.5 of the target architecture) — chars-per-token ratio updated after every
response; the chat token counter and the budget share this one estimator.
### 4.5 Materialization and caching
Stored content (attachments, images) stays reference-only in history;
materialization happens in the assembler through the `ContentLoader`. Two
fixes over today:
- the loader result is cached per `(storedPath, mtime, size)` — no re-reading
the whole conversation's binaries on every send, and byte-identical turns
keep the provider prompt cache warm;
- a failed load produces an **explicit placeholder block**
(`[attachment unavailable: name.png]`) instead of silently vanishing —
the model can say so, the manifest records it (fixes problem 5).
### 4.6 Observability: the context manifest
Every `assemble()` emits one debug-category log entry and a struct on the
event stream:
```
manifest {
layers: { agent.system: ~1.9k tok, env.project: ~70, skills.catalog: ~640 }
history: 26 messages, ~14.2k tok (3 tool rounds)
pinned: { linked:src/main.cpp: ~2.1k }
task: ~310 tok, 1 image (cached)
elided: [ tool_result a4f1 (~8k) ]
estimate: ~19.3k / limit 32k
}
```
Nothing is dropped silently — every filter (unsigned thinking, orphaned tool
pairs, failed loads, budget elisions) leaves a manifest record. The token
counter UI reads the same struct.
---
## 5. Wire contract — `ctx.*` stays, gains one producer
`Templates::ContextData` (→ `ctx.system_prompt`, `ctx.history`,
`ctx.prefix/suffix`, `ctx.files_metadata`) remains the contract between the
core and `[body]` templates — it is not legacy, it is the template-facing
view of the assembled context. The change is that exactly one function
produces it (`ContextAssembler::assemble`), for every request, and
`toLegacyContext`/`buildLegacyContext` are renamed into it. Existing
serialization rules carry over unchanged: system messages never enter
history, unsigned thinking is dropped, orphaned tool_use/tool_result pairs
are filtered, `CompletionContent` becomes `prefix`/`suffix`.
---
## 6. Migration plan
Ordered so every step lands independently and shrinks risk:
1. **Extract `ContextAssembler`** from `buildLegacyContext` (pure, unit-tested
against fixture histories) + manifest logging + failed-load placeholder
blocks. No behavior change otherwise. — DONE 2026-06-12
(`sources/Session/ContextAssembler.{hpp,cpp}`, `test/ContextAssemblerTest.cpp`;
manifest logged under the `qodeassist.context` category).
2. **ContentLoader cache** keyed by `(path, mtime, size)`. — DONE 2026-06-12
(`StoredContentCache` in `ChatSerializer`, owned per-chat by
`ClientInterface`, cleared on chat switch).
3. **Pinned providers**: linked files and open-files sync move out of the
`chat.context` system layer; invoked-skill bodies move into the turn's
user blocks. `chat.context` shrinks to project info + skills catalog.
— DONE 2026-06-12 (`Session::pinContext/unpinContext`, pinned splice in
`ContextAssembler::assemble`; `SkillInvocationContent` block persisted via
`MessageSerializer`, invisible in the chat UI by design; open-files sync is
covered because `ChatRootView` merges open editors into the linked list).
4. **Shared `EnvBlockFormatter`** in ContextEngine; chat/refactor/completion
stop hand-formatting project/file info. — DONE 2026-06-12
(`context/EnvBlockFormatter.{hpp,cpp}`: pure `formatProject`/`formatFile`
+ the `currentProject()` QtC gatherer; chat project block, refactor file
header, and completion's `getLanguageAndFileInfo` all route through it).
5. ~~**Continuation payload callback**~~ — REVERTED 2026-06-12 (implemented,
then judged a solution to a contrived problem; see §4.3). Continuations
are LLMQore's default replay; `ContextAssembler` runs once per `send()`.
6. **TokenEstimator + BudgetPolicy seam** — estimate + warning first, then
elision stages.
7. **ContextEngine port split** (delta #9 of the target architecture) —
`EditorContext` / `ProjectContext` / `TokenEstimator` behind ports, QtC
API only in `ide/context` adapters.
---
## 7. Open questions
1. ~~**Pinned placement**~~ — RESOLVED 2026-06-12: text blocks prepended to
the last user-role wire message (synthetic user message only when there is
none). A separate synthetic message would break strict role alternation on
some provider APIs; cache behaviour of the two shapes is identical.
2. ~~**Tool-loop relocation cost**~~ — RESOLVED 2026-06-12: relocation
rejected (LLMQore is deliberately a standalone agentic client). The
follow-up `setContinuationPayloadBuilder` inversion hook was also
implemented and reverted the same day — replay is the accepted behaviour
(§4.3).
3. **Budget v1 scope** — warn-only vs. enabling tool-result elision
immediately. Elision changes what the model sees; needs live validation.
4. **Completion and open files** — should completion gain pinned open-files
context (cheap with this design), or stay prefix/suffix-only for latency?

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 66 KiB

349
docs/creating-agents.md Normal file
View File

@@ -0,0 +1,349 @@
# Creating and Extending Agents
An *agent* is a TOML profile that tells QodeAssist which provider to call,
which model to use, and exactly what request body to send. All bundled agents
(Settings → QodeAssist → Agents) are built from the same files described here —
anything a bundled agent does, a user agent can do too.
## Where user agents live
Drop `*.toml` files into the user agents directory:
| OS | Path |
|---|---|
| Linux / macOS | `~/.config/QtProject/qtcreator/qodeassist/config/agents/` |
| Windows | `%APPDATA%\QtProject\qtcreator\qodeassist\config\agents\` |
QodeAssist creates the directory on startup. Files are loaded at plugin
startup; after adding or editing a file, restart Qt Creator.
Two layers are loaded:
1. **Bundled** agents shipped inside the plugin — read-only.
2. **User** agents from the directory above (marked with a `user` pill).
Agent `name`s are global across both layers. A user file that reuses a
bundled agent's `name` is rejected with an error — bundled agents cannot be
replaced; create your own agent under a new name and `extends` what you want
to build on. Two *user* files with the same `name` produce a warning, and
the alphabetically later file wins.
Load errors and warnings (TOML syntax, unknown keys, missing `extends`
parents, bodies that don't render to valid JSON) are reported in Qt Creator's
**General Messages** pane, prefixed with `[Agents]`.
## Minimal example
A custom agent is a thin delta over a bundled **wire base**: extend it, set the
model, override only what differs. The base already carries the provider, the
endpoint and the request-body serialization — you add the policy.
```toml
schema_version = 1
extends = "Claude Base Chat"
name = "My Claude"
model = "claude-sonnet-4-6"
```
Override a body field or the persona:
```toml
schema_version = 1
extends = "Claude Base Chat"
name = "My Claude (low temp)"
model = "claude-sonnet-4-6"
system_prompt = """You are a terse code reviewer."""
[body]
temperature = 0.3
```
Point a base at a different OpenAI-compatible provider by overriding the
provider instance and model:
```toml
schema_version = 1
extends = "OpenAI Base Chat"
name = "My DeepSeek"
provider_instance = "OpenAI Compatible"
model = "deepseek-chat"
```
Bundled agents are read-only — vary a preset by creating your own under a new
name. If all you want is a different model, you don't even need a file: set the
per-agent model override in the settings UI.
## Key reference
| Key | Required | Meaning |
|---|---|---|
| `schema_version` | no (default 1) | Format version; the plugin refuses files newer than it supports. |
| `name` | yes | Unique identifier; shown in the UI, referenced by rosters and `extends`. |
| `description` | no | Tooltip text in the Agents list. |
| `provider_instance` | yes* | Name of a provider instance (see below). |
| `model` | yes* | Default model; can be overridden per agent in settings. |
| `endpoint` | yes* | Path appended to the provider instance URL. May contain `${MODEL}` (e.g. Google: `/models/${MODEL}:streamGenerateContent?alt=sse`). |
| `system_prompt` | no | Jinja template for the system prompt (see building blocks below). FIM agents usually omit it. |
| `tags` | no | Free-form strings shown as pills in the UI for discoverability. |
| `enable_thinking` | no | Capability hint (UI badge). The `[body]` is the source of truth for what is sent. |
| `enable_tools` | no | Lets the provider inject tool definitions into the request. |
| `cache_prompt` / `cache_ttl` | no | Prompt caching (Anthropic); `cache_ttl = "1h"` selects the long TTL. |
| `cache_breakpoints` | no | Which cache points to set when `cache_prompt` is on: any of `"system"`, `"tools"`, `"history"`. Empty/omitted = all three. |
| `extends` | no | Name of a parent agent to inherit from. |
| `abstract` | no | Mark as template-only: it can be extended but is never loaded as a usable agent. Not inherited. |
| `hidden` | no | Loaded and usable, but not listed in selection UIs. Not inherited. |
| `[match]` | no | Routing constraints (see Routing). |
| `[body]` | yes* | The literal request body (see below). |
\* required after `extends` resolution — a child inherits these from its
parent, so it only states what differs.
### Required keys checked at load
A concrete (non-abstract) agent must end up with `name`,
`provider_instance`, `model`, `endpoint`, and a non-empty `[body]`. Unknown
keys anywhere at the top level or in `[match]` produce a warning — this
catches typos like `enable_thinkin`.
## Provider instances
`provider_instance` refers to a provider configuration (URL + API key
reference + client API). Bundled instances:
`Claude`, `Codestral`, `Google AI`, `llama.cpp`,
`LM Studio (Chat Completions)`, `LM Studio (Responses API)`, `Mistral AI`,
`Ollama (Native)`, `Ollama (OpenAI-compatible)`, `OpenAI (Chat Completions)`,
`OpenAI (Responses API)`, `OpenAI Compatible`, `OpenRouter`.
User-defined instances live next to agents in
`…/qodeassist/config/providers/*.toml` and follow the same
override-by-name layering.
## `extends` — inheriting from another agent
A child deep-merges over its parent: scalar keys are replaced, tables (such
as `[body]` and `[body.options]`) merge key-by-key, and **arrays are replaced
whole** (a child that wants the parent's `tags` plus one more must restate
the full list). Chains can be deeper than one level; cycles and missing
parents are load errors.
`abstract` and `hidden` are never inherited — extending a hidden agent
yields a visible child unless the child says otherwise.
Every provider ships an **abstract wire base** that carries only the provider
instance, endpoint and the request-body serialization — no model, persona,
tags or sampling. Extending one and setting `model` is all a custom agent
needs:
| Base | Provider / API |
|---|---|
| `Claude Base Chat` | Claude, Anthropic Messages (`/v1/messages`) |
| `OpenAI Base Chat` | OpenAI, Chat Completions (`/chat/completions`) |
| `OpenAI Responses Base` | OpenAI, Responses API (`/responses`) |
| `Google Base Chat` | Google AI, Gemini `generateContent` |
| `Ollama Base Chat` | Ollama, native `/api/chat` |
| `Ollama FIM Base` | Ollama, native `/api/generate` fill-in-the-middle |
For any OpenAI-compatible provider (Mistral, OpenRouter, LM Studio, llama.cpp,
DeepSeek, …) extend `OpenAI Base Chat` and override `provider_instance`.
Each bundled concrete agent (`Claude Sonnet Chat`, `Claude Code Completion`,
`OpenAI Chat Completions`, `OpenAI Responses Chat`, `Google Chat`,
`Ollama Chat`, `Ollama FIM`) is itself a thin delta over one of these bases and
works as a parent too — `extends = "Claude Sonnet Chat"` inherits everything including
the model.
## `[body]` — the request, literally
`[body]` is the request body, written exactly like the provider's curl
example. Per key, recursively:
- **string containing jinja** (`{{` or `{%`) — rendered, and the output is
spliced in as raw JSON. A render that produces nothing drops the key.
- **plain string / number / bool / table** — passed through unchanged.
```toml
[body]
max_tokens = 16000
stream = true
thinking = { type = "adaptive", display = "summarized" }
```
The message-array serialization (`messages` / `contents` / `input`, plus the
`system` renderer) lives in the **wire base**; a concrete agent that extends a
base inherits it and usually sets only scalar policy fields like the ones
above. A from-scratch agent (no `extends`) must carry the full serialization
itself — copy a bundled base's `[body]` as the starting point.
There are no runtime toggles: a thinking variant is a separate agent file
that carries the thinking fields in its body.
Every agent body is dry-run rendered at load against a synthetic
conversation (text, thinking, tool calls, tool results, images), so jinja
syntax errors, unknown callbacks, missing partials, and invalid JSON are
reported at startup — not mid-conversation. Trailing commas emitted by loops
are stripped automatically; don't bother with `loop.is_last` bookkeeping.
### Template data (`ctx`)
| Field | Content |
|---|---|
| `ctx.system_prompt` | Rendered system prompt (present only if the agent has one). |
| `ctx.prefix` / `ctx.suffix` | Code around the cursor (FIM/completion sessions). |
| `ctx.files_metadata` | Array of `{ file_path, content }` for attached files. |
| `ctx.history` | Array of messages: `{ role, content, content_blocks, images? }`. |
`content` is the message's flattened text; `content_blocks` is the
structured form:
| `type` | Fields |
|---|---|
| `text` | `text` |
| `thinking` | `thinking`, `signature` |
| `redacted_thinking` | `data` |
| `tool_use` | `id`, `name`, `input` (JSON object) |
| `tool_result` | `tool_use_id`, `content`, `name` |
| `image` | `data`, `media_type`, `is_url` |
### Callbacks available in `[body]`
| Callback | Purpose |
|---|---|
| `tojson(x)` | Serialize any value as JSON (correct quoting/escaping). Use it for every interpolated value. |
| `filter_by_type(blocks, "tool_use")` | Subset of `content_blocks` with the given type. |
| `filter_skip_role(history, "system")` | History without messages of a role. |
| `strip_signature_suffix(s)` | Remove a trailing `[Signature: …]` marker. |
### Partials and `{% include %}`
The message-array serialization is **inlined directly in each bundled wire
base** — there are no bundled partials to include. The `{% include %}`
mechanism still works for *your own* partials: drop a `partials/*.jinja` next
to your agent TOML and include it with
`{% include "partials/my_messages.jinja" %}`. Includes resolve against the
bundled root first, then the user agent's own directory; paths with `..` or a
leading `/` are rejected.
## `system_prompt` — composable building blocks
`system_prompt` is itself a jinja template, rendered with:
| Helper | Purpose |
|---|---|
| `{{ read_file("${PROJECT_DIR}/STYLE.md") }}` | Inline a file. Reads are restricted to the project directory, your QodeAssist user directory (`${CONFIG_DIR}`), and bundled `:/…` resources. |
| `{{ file_exists(p) }}` / `{{ read_dir(p) }}` | Existence check / directory listing (same root restrictions). |
| `{{ head_lines(s, n) }}` | First `n` lines of a string. |
| `basename`, `dirname`, `ext`, `lower`, `upper` | Path/string helpers. |
| `${PROJECT_DIR}`, `${CONFIG_DIR}` | Substituted before rendering. `${CONFIG_DIR}` is your QodeAssist user directory (where agent configs live). |
Example:
```toml
system_prompt = """
{{ read_file("${CONFIG_DIR}/roles/reviewer.md") }}
{% if file_exists("${PROJECT_DIR}/.qodeassist-style.md") %}
Project conventions:
{{ read_file("${PROJECT_DIR}/.qodeassist-style.md") }}
{% endif %}
"""
```
Reads fail **loud**: a path outside those roots — or a `read_file` whose target
is missing — aborts the request with a clear error instead of silently rendering
an empty prompt. For a genuinely optional file, guard it with `file_exists`,
which returns `false` for an allowed-but-absent path; only a path *outside* the
roots is treated as an authoring error and rejected outright.
The persona is simply what `system_prompt` renders to — inline the text or pull
shared text from a markdown file with `read_file`. The bundled chat agents do
exactly this: their `system_prompt` is `{{ read_file(":/roles/qt-cpp-developer.md") }}`,
reading the shipped role from the plugin resources. To switch personas in the
chat, switch agents: a persona variant is a thin `extends` child that overrides
only `system_prompt` (e.g. pointing `read_file` at any file of your own under
`${CONFIG_DIR}/…` or `${PROJECT_DIR}/…`). `read_file` reads exactly the path
you give it — there is no override convention that swaps a bundled file for a
same-named user file.
## Routing — `[match]` and the completion roster
`[match]` drives **code completion** routing only. Completion has an ordered
roster of agents; for the current file the **first roster entry whose `[match]`
accepts** wins. The other pipelines don't route: chat shows an allow-list of
agents and you pick one in the panel; quick refactor and chat compression each
use a single configured agent (set in QodeAssist → General).
```toml
[match]
file_patterns = ["*.qml", "*.js"]
path_patterns = ["*/tests/*"]
project_names = ["MyProject"]
```
- Dimensions are ANDed; an empty dimension is unconstrained; an entirely
empty/absent `[match]` is a catch-all.
- `file_patterns` are case-insensitive globs tested against the file name
and the full path; `path_patterns` against the full path only.
- `project_names` are exact, case-sensitive project names.
Typical completion setup: a specialized agent (e.g. an `Ollama FIM` variant
with `*.qml`) first, a catch-all agent last.
## Models
The TOML `model` is only the default. The settings UI can set a per-agent
override (stored in `agent_models.json`); the resolved model is also
substituted into `${MODEL}` in `endpoint` before sending.
## Contributing your agent to QodeAssist
The bundled agent set grows through contributions — if you've made an agent
for a provider or model that others could use, please send it upstream
instead of keeping it local. No C++ is needed:
1. Develop and verify the agent locally in the user agents directory.
2. In a fork, copy the TOML to `sources/agents/` and register the file in
`sources/agents/agents.qrc`.
3. Keep it a thin delta: extend the matching provider base and set only
`name`, `description`, `model`, `tags` (and `[body]` keys that genuinely
differ). Look at `claude_chat.toml` or `ollama_fim.toml` for the expected
shape.
4. Run the tests (`QodeAssistTest`): `BundledAgentsTest` automatically
loads every bundled agent, resolves its `extends` chain, and dry-renders
its `[body]` — if your TOML passes, it works.
5. Open a pull request.
Conventions:
- File name: `<provider>_<model_or_purpose>_<kind>.toml`
(e.g. `openrouter_deepseek_chat.toml`).
- `name` is user-visible and must be unique; include the provider and model
(e.g. `OpenRouter DeepSeek Chat`).
- Specialized completion agents should carry a `[match]` block so routing
can pick them automatically (e.g. `file_patterns = ["*.qml"]`).
- A new OpenAI-compatible provider is TOML-only: add a provider instance file
in `sources/providersConfig/`, then a concrete agent that `extends`
`OpenAI Base Chat` and overrides `provider_instance`. A genuinely new
request/response *format* (a new wire base) is the only thing that needs C++.
## Troubleshooting
- **Agent missing from the list** — check General Messages for `[Agents]
error:` lines; the file failed to parse, resolve, or validate.
- **`… has the same name as a bundled agent — bundled agents cannot be
replaced`** — pick a different `name`; use `extends` to inherit from the
bundled agent instead.
- **`Unknown key 'x' … ignored (typo?)`** — the key isn't part of the
schema; compare with the table above.
- **`Agent 'X' extends unknown agent 'Y'`** — the parent's `name` (not file
name) must match exactly; the parent must be bundled or in the same
directory.
- **`[body] failed to render to valid JSON`** — the dry run failed; the log
contains the rendered snippet. Usually a missing `tojson(...)` around an
interpolated string.
- **Edits not picked up** — agents are loaded at startup; restart
Qt Creator.

View File

@@ -5,8 +5,10 @@ QodeAssist provides two powerful ways to include source code files in your chat
## Attached Files
Attachments are designed for one-time code analysis and specific queries:
- Files are included only in the current message
- Content is discarded after the message is processed
- Files are sent as part of the current message
- The content is a snapshot taken at send time: it is stored with the chat
and stays in the conversation history exactly as sent, even if the file
changes on disk later
- Ideal for:
- Getting specific feedback on code changes
- Code review requests
@@ -20,8 +22,11 @@ Attachments are designed for one-time code analysis and specific queries:
Linked files provide persistent context throughout the conversation:
- Files remain accessible for the entire chat session
- Content is included in every message exchange
- Files are automatically refreshed - always using latest content from disk
- Files are automatically refreshed every request re-reads them and sends
the latest content from disk
- The snapshot travels next to your latest message and is never duplicated
into the conversation history, so linked files do not bloat the chat as it
grows
- Perfect for:
- Long-term refactoring discussions
- Complex architectural changes

View File

@@ -1,35 +0,0 @@
# Project Rules Configuration
QodeAssist supports project-specific rules to customize AI behavior for your codebase. Create a `.qodeassist/rules/` directory in your project root.
## Quick Start
```bash
mkdir -p .qodeassist/rules/{common,completion,chat,quickrefactor}
```
## Directory Structure
```
.qodeassist/
└── rules/
├── common/ # Applied to all contexts
├── completion/ # Code completion only
├── chat/ # Chat assistant only
└── quickrefactor/ # Quick refactor only
```
All `.md` files in each directory are automatically loaded and added to the system prompt.
## Example
Create `.qodeassist/rules/common/general.md`:
```markdown
# Project Guidelines
- Use snake_case for private members
- Prefix interfaces with 'I'
- Always document public APIs
- Prefer Qt containers over STL
```

View File

@@ -206,7 +206,6 @@ The LLM receives:
- **Cursor Position**: Marked with `<cursor>` tag
- **Selection Markers**: `<selection_start>` and `<selection_end>` tags
- **Your Instructions**: Built-in, custom, or typed
- **Project Rules**: If configured (see [Project Rules](project-rules.md))
### Context Configuration
@@ -270,7 +269,6 @@ Fully local setup for offline or secure environments.
## Related Documentation
- [Project Rules](project-rules.md) - Project-specific AI behavior customization
- [File Context](file-context.md) - Attaching files to chat context
- [Ignoring Files](ignoring-files.md) - Exclude files from AI context
- [Provider Configuration](../README.md#configuration) - Setting up LLM providers

654
docs/target-architecture.md Normal file
View File

@@ -0,0 +1,654 @@
# QodeAssist — Target Architecture (v1.0)
Status: design baseline, derived from the fixed use-case inventory below.
Scope: the complete plugin, designed "from scratch" — what the architecture
should be if nothing legacy constrained it. The current code (see
`architecture.md`) already converges on this; §10 lists the remaining deltas.
---
## 1. Use-case inventory (requirements baseline)
Every architectural decision below is justified by one of these. Features not
on this list (Rules system, legacy provider/model/template pickers, Stack A)
are intentionally out of scope.
| # | Use case | What the user gets |
|---|----------|--------------------|
| U1 | **Code completion** | Inline FIM/instruct suggestions via LSP; auto + manual trigger, multiline, smart-context suppression, accept full / word-by-word |
| U2 | **Chat assistant** | 4 placements (sidebar, bottom pane, editor tab, floating window); streaming text + thinking blocks + tool blocks + file-edit blocks (apply/undo); attachments, linked files, @-mentions, open-files sync; token counter; persisted history; one-click summarization; runtime agent picker |
| U3 | **Quick refactor** | Selection + instruction by hotkey; custom-instructions library; separate agent; optional tools; streamed result inserted into the editor |
| U4 | **Tools** | read/create/edit file, search, find, list, build, diagnostics, terminal, todo, load_skill; per-tool enable |
| U5 | **Skills** | discovery from `.qodeassist/skills`, `.claude/skills`, `~/.claude/skills`; auto-injection, explicit `/` picker, always-on |
| U6 | **MCP** | server mode (expose plugin tools, HTTP/SSE + stdio bridge) and client hub (consume external tools in chat/refactor) |
| U7 | **Providers** | 13 `client_api` types over one GenericProvider; secrets store; local-server autostart; model listing |
| U8 | **Agents** | TOML profiles: abstract wire-base + thin concrete via `extends`, `[body]` table 1:1 with the wire request (message serialization inlined per base), `match` rules (completion routing), `cache_breakpoints`, per-agent model override, per-pipeline agent selection |
| U9 | **Personas** | persona = the agent's `system_prompt`; shared text lives in plain files pulled in via `read_file` — bundled defaults under `:/roles/…`, or any file the user points at under `${PROJECT_DIR}` / `${CONFIG_DIR}` (your QodeAssist user directory); `read_file` reads the literal path given (no override/fallback resolution); switching persona = switching agent (no separate Roles subsystem) |
| U10 | **Configuration UI** | settings pages for everything above; per-project settings; updater + status widget |
---
## 2. Design principles
1. **One stack.** Every LLM byte — completion, chat, compression, refactor —
flows through the same `Session` pipeline. No parallel legacy path.
2. **Hexagonal core.** The runtime (agents, sessions, providers, templates,
prompt rendering) has zero Qt Creator dependencies. The IDE host composes
that core; IDE-specific facts enter only through ports (document reading,
project scanning, secrets, tool hosting).
3. **Configuration is declarative, code is mechanism.** What is sent (request
`[body]`, system prompt, endpoint, model) lives in TOML/JSON/Jinja and is
user-overridable; *how* it is sent (streaming, retries, tool loop, event
routing) lives in C++ and is identical for all providers.
4. **Agent-driven behavior.** The agent's TOML declares what a conversation
uses (`enable_tools`, `enable_thinking`); features and UI adapt to the
agent config instead of switching on provider names or provider-declared
capability flags.
5. **Single source of truth for conversation state.** `ConversationHistory`
owns the messages; `ChatModel` and persistence are projections of it, never
independent copies.
6. **Per-feature composition roots, no singletons.** Each feature constructs
and owns its dependencies (`new` + parent); shared services are passed
explicitly (constructor/setter, QML context properties for the chat).
7. **Streaming-first event model.** One typed `ResponseEvent` stream is the
only contract between the core and every consumer. Deltas exist for live
UI (chat); one-shot pipelines (completion, refactor) ignore them,
wait for `finished`, and read the final assistant message from history.
8. **Fail at load, not mid-conversation.** Agent profiles are validated when
loaded (partials resolve, assembled body parses as JSON against a synthetic
context), so a config error never surfaces as a silent runtime drop.
---
## 3. Layered model
```mermaid
flowchart TB
subgraph HOSTS["Hosts — composition roots"]
PLUGIN["Qt Creator plugin<br/>qodeassist.cpp"]
end
subgraph L5["L5 · Presentation"]
LSP["LSP bridge<br/>inline suggestions"]
QMLUI["ChatView QML<br/>4 placements"]
RW["Refactor widgets"]
SUI["Settings pages"]
end
subgraph L4["L4 · Features"]
FCOMP["CompletionFeature"]
FCHAT["ChatFeature"]
FREF["RefactorFeature"]
end
subgraph L3["L3 · Capabilities"]
CTX["ContextEngine<br/>ports + QtC adapters"]
TOOLS["ToolKit"]
SKILLS["SkillsEngine"]
MCPH["McpHub<br/>client + server"]
end
subgraph L2["L2 · Core runtime — IDE-independent"]
SM["SessionManager"]
SESS["Session"]
AGF["AgentFactory + AgentRouter"]
AG["Agent"]
PROV["GenericProvider"]
TPL["JsonPromptTemplate"]
end
subgraph L1["L1 · Declarative config"]
PCONF["providers/*.toml"]
ACONF["agents/*.toml + partials/*.jinja"]
ROST["rosters / pipelines"]
PERS["personas/*.md"]
SKCONF["skills/*.md"]
SEC["SecretsStore"]
end
subgraph L0["L0 · Wire — LLMQore"]
CLIENTS["*Client — SSE streaming"]
TOOLFW["Tool framework"]
MCPT["MCP transports"]
end
PLUGIN --> L4
PLUGIN --> SUI
LSP --> FCOMP
QMLUI --> FCHAT
RW --> FREF
FCOMP --> SM
FCHAT --> SM
FREF --> SM
FCOMP --> CTX
FCHAT --> CTX
FREF --> CTX
FCHAT --> SKILLS
FCHAT --> TOOLS
FREF --> TOOLS
TOOLS --> TOOLFW
MCPH --> MCPT
SM --> SESS
SESS --> AG
AGF --> AG
AG --> PROV
AG --> TPL
AGF --> ACONF
AGF --> PCONF
AGF --> SEC
AGF --> ROST
TPL --> PERS
PROV --> CLIENTS
SKILLS --> SKCONF
```
### Layer contracts
| Layer | Contains | May depend on | Must NOT depend on |
|-------|----------|---------------|--------------------|
| **L0 Wire** | LLMQore clients (one per wire protocol: Claude, OpenAI Chat, OpenAI Responses, Google, Ollama, Mistral, llama.cpp), tool framework, MCP transports | Qt Network | anything above |
| **L1 Config** | `ProviderInstance`, `AgentProfile` (+ loader/validator), rosters, personas, skills, secrets port | toml++, inja | Qt Creator, L2+ |
| **L2 Core** | `Agent`, `AgentFactory`, `AgentRouter`, `Provider`/`GenericProvider`, `JsonPromptTemplate`, `Session`, `SessionManager`, `ConversationHistory`, `SystemPromptBuilder`, `ResponseRouter`, `ToolContributorRegistry` | L0, L1 | Qt Creator, QML, features |
| **L3 Capabilities** | `ContextEngine` (ports + QtC adapters), `ToolKit` (built-in tools), `SkillsEngine`, `McpHub` | L0L2, QtC APIs *only in adapters* | features, UI |
| **L4 Features** | `CompletionFeature`, `ChatFeature` (send/stream, compression, token counting, file edits), `RefactorFeature` | L2, L3 | each other |
| **L5 Presentation** | LSP bridge, ChatView QML, refactor widgets, settings pages | its feature | core internals |
| **Hosts** | plugin shell | everything (composition only) | — |
The hard rule that makes testability free: **L0L2 build into
targets with no Qt Creator linkage.** Tests link L0L2 directly;
the plugin adds L3 adapters, L4, L5.
---
## 4. Core domain model
Rendered copy: [core-class-diagram.svg](core-class-diagram.svg) (regenerate
when the diagram below changes).
```mermaid
classDiagram
direction TB
class SessionManager {
+acquire(agentName) Session
+release(session)
+toolContributors() ToolContributorRegistry
}
class Session {
+send(blocks)
+cancel()
+history() ConversationHistory
+systemPrompt() SystemPromptBuilder
+event(ResponseEvent)
+finished(id, stopReason)
+failed(id, ErrorInfo)
+cancelled(id)
}
class ConversationHistory {
+messages() vector~Message~
+lastAssistantText() string
+append(Message)
+reset(vector~Message~)
}
class Message {
+role Role
+blocks vector~ContentBlock~
}
class SystemPromptBuilder {
+setLayer(id, text, priority)
+removeLayer(id)
+compose() string
}
class ResponseRouter {
+attach(BaseClient)
+event(ResponseEvent)
}
class Agent {
+config() AgentConfig
+provider() Provider
+promptTemplate() PromptTemplate
}
class AgentFactory {
+create(name) Agent
+configByName(name) AgentConfig
+effectiveModel(name) string
}
class AgentRouter {
+pickAgent(roster, fileCtx) string
}
class Provider {
<<interface>>
+prepareRequest(request, ctx)
+sendRequest(json) RequestID
+cancelRequest(RequestID)
}
class GenericProvider {
-client BaseClient
}
class PromptTemplate {
<<interface>>
+buildFullRequest(request, ctx)
}
class JsonPromptTemplate {
-bodySpec QJsonObject
-env InjaEnvironment
}
class ToolContributorRegistry {
+registerContributor(fn)
+applyTo(ToolsManager)
}
SessionManager o-- Session : pools
SessionManager --> AgentFactory : builds via
SessionManager --> ToolContributorRegistry
Session *-- ConversationHistory
Session *-- SystemPromptBuilder
Session *-- ResponseRouter
Session --> Agent
ConversationHistory o-- Message
Agent *-- Provider
Agent *-- PromptTemplate
AgentFactory ..> Agent : creates
AgentFactory --> AgentRouter
GenericProvider --|> Provider
JsonPromptTemplate --|> PromptTemplate
```
Responsibilities, one line each:
- **Agent** — immutable bundle of *what to call*: resolved config + provider +
compiled prompt template. No request state.
- **Session** — one conversation's runtime: owns history, system-prompt
layers, pinned context providers, response routing, the in-flight request,
and the content of each dispatched request (tool continuations replay it
inside LLMQore; see `context-architecture.md` §4.3).
`send(blocks)` is the *only* entry point: every pipeline appends a user
message and dispatches; there are no per-pipeline send variants. What
differs between completion, chat, and refactor is the agent's template and
the consumption mode (deltas vs final message), never the Session API.
- **SessionManager** — creates/pools sessions per agent; the single place
features go to get one. Pooling (not per-message construction) covers the
"fresh agent + provider + secrets read per request" latency cost. It reuses
only the expensive parts (agent, provider, compiled template, secrets read):
`acquire` hands out a session with cleared history and system-prompt
layers, so one-shot pipelines never see a previous exchange.
- **AgentRouter** — the agent picker for *auto-routed* pipelines. Only code
completion routes by context: `pickAgent(roster.codeCompletion, {file,
project})` walks the ordered roster and returns the first agent whose match
rules fit. Chat is user-driven (the picker filters to the `chatAssistant`
allow-list; the user chooses); compression and quick refactor each use a
single configured agent. No feature-local routing logic beyond these.
- **GenericProvider** — one class for all 13 client APIs; varies only by
LLMQore client factory + metadata. Request *shape* belongs to the template,
never to the provider.
- **JsonPromptTemplate** — compiles the agent's `[body]` table; renders
Jinja-bearing string values, splices raw JSON, drops empty keys; validated
at load time.
- **SystemPromptBuilder** — ordered named layers (`agent.system`,
`chat.context`, `refactor`, `compression`); features mutate only their own
layer.
- **ResponseRouter / ResponseEvent** — adapts LLMQore client signals into one
typed stream: `TextDelta`, `ThinkingDelta`, `ToolCallStart/End`,
`ToolResult`, `Usage`, `Error`, `MessageStop`.
- **ToolContributorRegistry** — contributors (built-in ToolKit, SkillTool,
McpHub) register once; `SessionManager` applies them to every new session's
`ToolsManager`. This is how MCP tools reach chat *and* refactor (U6) without
feature code knowing about MCP.
---
## 5. Runtime flows
### 5.1 Chat (U2) — the richest path
```mermaid
sequenceDiagram
autonumber
actor U as User
participant V as ChatView QML
participant F as ChatFeature
participant SM as SessionManager
participant S as Session
participant T as JsonPromptTemplate
participant P as GenericProvider
participant C as LLMQore Client
participant R as ResponseRouter
U->>V: message + attachments
V->>F: sendMessage(text, files, images)
F->>SM: acquire(activeAgent)
SM-->>F: Session (pooled)
F->>S: systemPrompt().setLayer("chat.context", project + skills + linked files)
F->>S: send(userBlocks)
S->>T: buildFullRequest(history, system, ctx)
T-->>S: request JSON (body is 1:1 with the API)
S->>P: sendRequest(json)
P->>C: HTTP POST, SSE stream
loop streaming
C-->>R: chunk / thinking / tool_use / usage
R-->>S: ResponseEvent
S-->>F: event(ResponseEvent)
F-->>V: ChatModel projection update
end
opt tool call requested
S->>S: execute tool via ToolsManager
S->>P: continue with tool_result
end
C-->>R: finalized
R-->>S: MessageStop + Usage
S-->>F: finished()
F->>SM: release(session)
```
State ownership in chat: `Session.history()` is the truth. `ChatModel` is a
QML projection built from history events (`messageAdded`, `messageUpdated`);
`ChatSerializer`/`ChatHistoryStore` persist *history*, and restoring a chat
seeds a new session's history — never the other way around. File-edit blocks,
apply/undo, and the token counter are ChatFeature concerns layered on the
event stream.
### 5.2 Completion (U1)
```
LSP getCompletionsCycling
→ CompletionFeature
agent = AgentRouter.pickAgent(roster.codeCompletion, {file, project})
session = SessionManager.acquire(agent)
ctx = ContextEngine: prefix/suffix + open-files context (policy from
CodeCompletionSettings — editor policy, not agent config)
session.send(blocks{completion context})
on finished → history().lastAssistantText()
→ CodeHandler (output-mode post-processing) → LSP items
```
No special Session method: the completion context travels as the content of
an ordinary user message (a structured block carrying prefix/suffix + file
context), and the template context exposes it as `ctx.prefix` / `ctx.suffix`.
FIM vs instruct is *agent config* (template + body), not feature code: a FIM
agent's body renders `prefix`/`suffix` into FIM fields; an instruct agent's
body renders the same exchange as a chat-shaped request. The feature is
identical for both — and since completion has no incremental UI, it never
touches the delta stream: it waits for `finished` and reads the last message.
### 5.3 Quick refactor (U3)
```
Hotkey → RefactorFeature
agent = pipelines.quickRefactor (single configured agent)
session = SessionManager.acquire(agent)
session.systemPrompt().setLayer("refactor", tagged selection + output rules)
session.send(blocks{instruction})
on finished → history().lastAssistantText()
→ ResponseCleaner → RefactorResult → editor insert (accept/reject)
```
Same consumption mode as completion: the feature listens to
`Session::finished`/`failed` only (events at most drive a progress spinner
and cancel) and reads the result from history — it never connects to raw
client signals. Tool calls during refactor run inside the session's tool
loop; history's last assistant message is whatever the model produced after
the final tool round.
### 5.4 Compression (U2)
Compression is ChatFeature reusing the same path with the single
`pipelines.chatCompression` agent and a `"compression"` system layer; the
summary starts a new history.
---
## 6. Configuration model
```mermaid
erDiagram
AGENT_PROFILE ||--o| AGENT_PROFILE : extends
AGENT_PROFILE }o--|| PROVIDER_INSTANCE : provider_instance
AGENT_PROFILE }o--o{ PARTIAL : includes
AGENT_PROFILE }o--o{ PERSONA : read_file
ROSTER }o--o{ AGENT_PROFILE : ranks
MODEL_OVERRIDE |o--|| AGENT_PROFILE : overrides_model
PROVIDER_INSTANCE }o--|| CLIENT_API : client_api
PROVIDER_INSTANCE }o--o| SECRET : api_key_ref
PROVIDER_INSTANCE ||--o| LAUNCH_CONFIG : autostarts
AGENT_PROFILE {
string name
bool abstract
string system_prompt "jinja; inline text or read_file()"
json body "request body, 1:1 with API"
string endpoint "may contain MODEL placeholder"
string model "default; override wins"
bool enable_tools "capability hint"
bool enable_thinking "capability hint"
json match "file, path, project patterns"
}
PROVIDER_INSTANCE {
string name
string client_api
string url
string api_key_ref
}
PERSONA {
string path "plain markdown file"
}
ROSTER {
string pipeline "completion, chat, compression, refactor"
list agents "ordered candidates"
}
```
Rules of the config layer (full spec: `agent-templates-design.md`):
- `[body]` **is** the request body — field-by-field, deep-mergeable through
`extends`; Jinja-bearing strings render and splice as raw JSON, literals
pass through. No separate sampling/thinking merge machinery.
- Message serialization is inlined in each abstract **wire base**; there are no
bundled partials. `{% include %}` still resolves sandboxed roots (bundled
`:/agents/`, then the user agent's dir) for user-supplied partials; a missing
partial is a load-time error.
- Two-level hierarchy: an abstract **wire base** per provider (provider +
endpoint + serialization only — no model/persona/tags/sampling) and a thin
concrete agent carrying all policy.
- Per-agent model override lives in `agent_models.json` and is applied by
`AgentFactory`; `${MODEL}` in `endpoint` covers URL-model providers.
- Personas are not a subsystem: the profile's `system_prompt` is the persona.
Shared text lives in plain markdown under the sandboxed roots and is pulled
in with `{{ read_file(...) }}`; a persona-switch is an agent-switch — the
only system-prompt edit point is the profile.
- Secrets never appear in TOML; `api_key_ref` resolves through the
`SecretsStore` port (QtC keychain in the plugin).
---
## 7. Capabilities layer
**ContextEngine** replaces the monolithic ContextManager with three focused
services behind IDE-agnostic ports:
| Service | Port (L2-visible) | QtC adapter |
|---------|-------------------|-------------|
| `EditorContext` — current doc, selection, prefix/suffix | `IDocumentReader` | TextEditor API |
| `ProjectContext` — root, file listing, ignore filtering (`.qodeassistignore`), open files, changes | `IProjectScanner` | ProjectExplorer API |
| `TokenEstimator` — input estimates, calibrated by server usage | — (pure) | — |
**ToolKit** registers the built-in tools (U4) with the
`ToolContributorRegistry`; each tool declares a permission class (read /
write / execute) so per-tool enablement (settings) and confirmation policy
(terminal commands) live in one place.
**SkillsEngine** (U5): discovery + watching of the three skill roots; exposes
`catalogText()` (names + descriptions for the system prompt),
`alwaysOnBodies()`, and the `load_skill` tool; the `/` picker injects a
skill's body into a single message.
**McpHub** (U6): client side connects configured servers and contributes
their tools through the same registry (tools reach every session uniformly);
server side exposes ToolKit over HTTP/SSE + stdio bridge.
---
## 8. Cross-cutting policies
Architecture is the rules as much as the boxes. These policies bind every
layer and are part of the contract:
### 8.1 Threading
The core runs on the GUI thread; concurrency is the Qt event loop plus async
network I/O — no shared-state threading anywhere in L1L4. Work that can
block (project scans, token estimation over large trees) hides behind L3
ports; an adapter may use worker threads internally but delivers results as
queued signals. Core types are therefore deliberately not thread-safe.
### 8.2 Request lifecycle
A session has at most one in-flight request; `send()` while in flight cancels
the previous request first. Every request terminates in exactly one of three
states — `finished(stopReason)`, `failed(error)`, `cancelled()` — and
cancellation is *not* an error: no consumer may string-match a message to
tell them apart.
### 8.3 Errors
Runtime errors are typed, not strings: `ErrorInfo { category, message,
providerDetail }` with categories `Config | Auth | Network | Provider |
Validation | Tool`. The category drives UI affordances (Auth → open provider
settings, Network → offer retry); free text is for logs only. Load-time
errors (principle 8) surface in the agents settings page, never as a failed
send.
### 8.4 Timeouts and retries
Transfer timeouts are per-pipeline policy (completion short, chat/refactor
from settings), applied by the feature — never baked into agent profiles. A
streaming request is never silently retried after the first byte; automatic
retry with capped backoff is allowed only for connection-phase failures.
Anything beyond that is an explicit user action.
### 8.5 Observability
One `RequestID` correlates feature → session → provider → client → events →
logs. Each layer logs under its own category (`qodeassist.session`,
`qodeassist.provider`, `qodeassist.tools`, …); request bodies are logged only
at debug level, and secrets are redacted unconditionally. `Usage` events are
the single source feeding the token counter, `TokenEstimator` calibration,
and the performance log.
### 8.6 Config compatibility
Agent profiles carry a `schema_version`; the loader migrates old user
configs forward or rejects them with an actionable message — silent
reinterpretation is forbidden. Bundled profiles are read-only resources that
user profiles shadow by name. Persisted chat history is versioned the same
way.
### 8.7 Security
Secrets exist only behind the `SecretsStore` port; they never reach TOML,
logs, or persisted chats. Tool permission classes (read / write / execute)
centralize the confirmation policy. The MCP server is opt-in and binds
loopback by default; skill and partial roots are sandboxed — nothing resolves
outside its declared directory.
### 8.8 Testing
The test pyramid follows the layers:
| Layer | Strategy |
|-------|----------|
| L1 | loader/validator unit tests; golden-file snapshots of every bundled profile's rendered body against a synthetic context — the same check as load-time validation, run in CI |
| L2 | `Session` / `ResponseRouter` replay tests over recorded SSE fixtures per provider; fake `BaseClient`, no network |
| L3 | contract tests against the ports; QtC adapters covered only by plugin integration |
Layering is enforced mechanically, not by review: each layer is its own
CMake target, and the core targets do not link Qt Creator — a violating
include fails the build.
---
## 9. Module / target layout
```
core/ # no Qt Creator linkage — tests link this
config/ # L1: ProviderInstance, AgentProfile, loaders,
# validators, rosters, personas, secrets port
providers/ # L2: Provider, GenericProvider, ProviderFactory,
# ClaudeCacheControl
prompt/ # L2: JsonPromptTemplate, ContextRenderer, partials
agents/ # L2: Agent, AgentFactory, AgentRouter
session/ # L2: Session, SessionManager, ConversationHistory,
# SystemPromptBuilder, ResponseRouter, events
skills/ # L3 (IDE-free part): SkillsEngine, loaders
ide/ # Qt Creator adapters only
context/ # EditorContext, ProjectContext adapters, ignore
tools/ # built-in ToolKit (build, issues, editor edits…)
mcp/ # McpHub managers
features/
completion/ # LSP bridge + CompletionFeature + CodeHandler
chat/ # ChatFeature: ClientInterface, ChatModel(projection),
# Compressor, TokenCounter, FileEditController,
# serializer/store
refactor/ # RefactorFeature + custom instructions
ui/
ChatView qml/, widgets/, settings pages
hosts/
plugin/ # qodeassist.cpp — composition root, actions, panes
tests/
config/ # loader cases + golden rendered-body snapshots
session/ # SSE replay fixtures per provider, fake client
external/
llmqore/ inja/ tomlplusplus/
```
Dependency direction is strictly downward in the table of §3; `features/*`
never include each other; `ui/*` talks only to its feature; `hosts/*` are the
only places allowed to know about everything.
---
## 10. Deltas from the current working tree
What "from scratch" changes relative to today's code — the migration
checklist to call the architecture done:
1. **Stack A physical teardown** — delete root `providers/*`,
`pluginllmcore/*`, `ConfigurationManager`, legacy provider/model/template
settings pages, and the Stack A registration + MCP loop in
`qodeassist.cpp`. Runtime already has no consumers.
2. **Single history owner** — make `ChatModel` a projection of
`Session::history()` (subscribe to history signals) instead of a parallel
message store with seed-on-send; `ChatCompressor` reads history, not the
model.
3. **Single send path** — delete `Session::sendCompletion(ContextData)`;
the completion context becomes user-message content sent through the one
`send()` (the completion handler already reads its result from history's
last message). Move `QuickRefactorHandler` off raw `BaseClient` signals
(`requestCompleted`/`requestFinalized`/`requestFailed`) onto
`Session::finished`/`failed` + `history().lastAssistantText()`.
4. **Three-state request lifecycle** — add `cancelled` to `Session`; today
`cancel()` emits `failed(id, "Cancelled by user")` and consumers must
string-match to tell cancellation from failure (§8.2).
5. **Typed errors** — replace `lastError` strings and the `failed(QString)`
payload with `ErrorInfo` categories (§8.3).
6. **Agent selection by pipeline shape** — completion is the only context-routed
pipeline (`AgentRouter.pickAgent(roster.codeCompletion, {file, project})`);
chat picker filters to the `chatAssistant` allow-list; quick refactor and
compression each read a single configured agent (no routing).
7. **MCP tools on session clients** — register MCP-contributed tools through
`ToolContributorRegistry` so chat/refactor sessions get them (today they
are registered only on dead Stack A providers).
8. **Session pooling**`SessionManager.acquire/release` with a small pool
per agent, replacing per-message agent + provider + secrets construction.
9. **ContextManager split** — extract `EditorContext` / `ProjectContext` /
`TokenEstimator` behind ports; move QtC API use into `ide/context`.
10. **`[body]` model completion** — finish `agent-templates-design.md`
(body-table rendering, sandboxed `include`, load-time validation, model
override + `${MODEL}`, `schema_version` gate), delete sampling/thinking
merge machinery.
11. **Message type unification** — one `Message`/`ContentBlock` shape from
history to QML (roles, text, thinking, tool use/result, images); delete
the parallel `ChatModel::Message` struct.
12. **Test scaffolding** — golden rendered-body snapshots + SSE replay
fixtures (§8.8); CI builds the core targets without Qt Creator so a
layering violation fails the build.
13. **Stale docs cleanup**`project-rules.md` describes the removed Rules
system; mark or delete.

View File

@@ -0,0 +1,192 @@
# ToolLoopRunner — implementation plan
Status: plan for "variant C" (2026-06-13). Supersedes step 5 of
`context-architecture.md` §6.
Context that shapes this plan:
- The tool loop STAYS in LLMQore — the library remains a complete standalone
agentic client. Variant C changes its *shape*, not its home: the loop
becomes a named class, `BaseClient` slims toward transport.
- 2026-06-12 the variant-A hook (`setContinuationPayloadBuilder` + Session
feeding assembler-built continuation bodies) was implemented and then
REVERTED by the project owner: the frozen-replay problem was judged
contrived (replay carries the full filtered history of its base payload;
mid-loop file changes reach the model via tool results; growth is bounded
by `maxToolContinuations`). The reverted llmqore diff is saved at
`/tmp/llmqore-continuation-builder.patch`.
- Therefore this plan has two tracks. **Track 1 (the actual ask): the
structural refactoring.** Track 2 (host payload source) is OPTIONAL,
parked, and only happens if the 2026-06-12 verdict is explicitly reversed.
- The context-architecture steps 14 implementation (ContextAssembler,
content cache, pinned providers, EnvBlockFormatter, ~1200 lines incl.
tests) is parked in `stash@{0}` ("new context refactor") on
`dev-release-1-0`. It is NOT required for track 1.
---
## 1. Current anatomy (llmqore @ 0348ac8)
- `BaseClient` mixes two responsibilities:
- **transport** — HTTP/SSE per request, `ActiveRequest { stream, buffers,
url, mode, usage, … }`, accumulation in protocol subclasses;
- **loop policy** — `ActiveRequest.originalPayload`,
`ActiveRequest.continuationCount`, `m_maxToolContinuations`,
`checkContinuationLimit`, `handleToolContinuation`.
- Loop entry: protocol clients call `executeToolsFromMessage(id)` at their
message-end detection points (11 call sites across 7 clients); it forwards
`tool_use` blocks to `ToolsManager::executeToolCall`.
- `BaseClient::tools()` wires `ToolsManager::toolExecutionComplete(id,
results)` → `handleToolContinuation`: round-limit check → continuation
body via the protocol-virtual `buildContinuationPayload(originalPayload,
message, toolResults)` → `finalizeTurn` → `sendRequest(id, storedUrl,
payload, storedMode)`.
## 2. Target design
### 2.1 ToolLoopRunner (new, llmqore)
```cpp
class LLMQORE_EXPORT ToolLoopRunner : public QObject
{
Q_OBJECT
public:
explicit ToolLoopRunner(BaseClient *client);
int maxRounds() const noexcept;
void setMaxRounds(int limit) noexcept;
private:
void onToolsCompleted(const RequestID &id,
const QHash<QString, ToolResult> &results);
void onRequestClosed(const RequestID &id);
struct LoopState
{
int rounds = 0;
};
BaseClient *m_client = nullptr;
QHash<RequestID, LoopState> m_loops;
int m_maxRounds = 10;
};
```
The whole loop policy on one screen:
```cpp
void ToolLoopRunner::onToolsCompleted(const RequestID &id,
const QHash<QString, ToolResult> &results)
{
auto &loop = m_loops[id];
if (++loop.rounds > m_maxRounds) {
m_client->abortRequest(id, "Tool continuation limit reached");
m_loops.remove(id);
return;
}
const QJsonObject payload = m_client->buildReplayContinuation(id, results);
if (payload.isEmpty()) {
m_client->abortRequest(id, "Failed to build continuation payload");
m_loops.remove(id);
return;
}
m_client->continueRequest(id, payload);
}
```
- `LoopState` is keyed by request id — several concurrent requests on one
client (two chat panels on one provider) never collide.
- Cleanup: `onRequestClosed` (connected to `requestFailed` +
`requestFinalized`) drops the state.
### 2.2 BaseClient becomes transport + tool dispatch
Gains (transport primitives; `continueRequest` public — it is also the seam
any future host-driven mode would use; failure path via runner friendship):
```cpp
ToolLoopRunner *toolLoop(); // owned, created with tools()
void continueRequest(const RequestID &id, const QJsonObject &payload);
// finalizeTurn + resend stored url/mode
QJsonObject buildReplayContinuation(const RequestID &id,
const QHash<QString, ToolResult> &results);
// originalPayload + protocol virtual
```
Loses (moves to the runner): `handleToolContinuation`,
`checkContinuationLimit`, `m_maxToolContinuations`,
`ActiveRequest::continuationCount`. The `toolExecutionComplete` connection
in `tools()` retargets to the runner.
Keeps: `executeToolsFromMessage` (the 11 protocol call sites stay
untouched), the protocol-virtual `buildContinuationPayload` (it IS the
replay serialization), `originalPayload` storage,
`setMaxToolContinuations`/`maxToolContinuations` as thin forwarders to
`toolLoop()` — existing consumers (QodeAssist `ClientInterface`,
`QuickRefactorHandler`, third parties) compile unchanged.
## 3. Track 1 — structural refactoring (the plan)
Bit-identical behavior throughout; QodeAssist only needs a submodule bump.
**Phase 1 — transport primitives.** Add `continueRequest` +
`buildReplayContinuation` + public `abortRequest` (now also the body of
`cancelRequest`). — DONE 2026-06-13.
**Phase 2 — extract the runner.** New `ToolLoopRunner` class; move round
state + limit; retarget the `toolExecutionComplete` connection; delete
`handleToolContinuation` / `checkContinuationLimit` /
`ActiveRequest::continuationCount`; forwarders for
`setMaxToolContinuations`. — DONE 2026-06-13
(`include/LLMQore/ToolLoopRunner.hpp`, `source/core/ToolLoopRunner.cpp`,
`tests/tst_ToolLoopRunner.cpp` — 7 cases: replay flow, round limit, missing
replay data, two interleaved ids, cleanup on finalize/cancel, forwarders;
`continueRequest` is virtual as the test seam; llmqore architecture docs
updated: overview, request-lifecycle diagram, tools).
Deliberate behavior delta (an improvement, worth knowing while testing): an
empty payload from the protocol's `buildContinuationPayload` now aborts the
request with "Missing data for tool continuation" instead of silently
sending an empty body.
**Phase 3 — submodule bump (after the user runs llmqore tests).**
QodeAssist: bump the submodule pointer, verify live in the plugin (Ollama +
tools, Claude + tools); update `context-architecture.md`
§4.3/§6.5 to point here; update project memory.
## 4. Track 2 — host payload source (PARKED)
Only if the 2026-06-12 "проблема надумана" verdict is explicitly reversed.
Variant C makes it a ~40-line addition, so nothing is lost by parking:
- `ToolLoopRunner::setPayloadSource(id, std::function<QJsonObject(const
RequestID &)>)`; registered source is authoritative for its id (empty
result → abort, never silent fallback to replay).
- Host prerequisite: restore the context work from `stash@{0}`
(ContextAssembler + `Session::makePayload`); expect conflicts in
`Session.cpp` with the newer `dev-release-1-0` refactor commits
("Remove override tools in Session send" etc.).
- Session registers the source after `provider->sendRequest` (same-thread,
race-free; `QPointer` guard).
- Assembler continuation rules: pinned blocks anchor to the turn's TYPED
user message (recorded 2026-06-12), manifest per round.
## 5. Risks
| Risk | Mitigation |
|---|---|
| Behavior drift while moving the loop | phases are mechanical; same `buildContinuationPayload` virtuals; llmqore tests + plugin smoke before/after |
| Two sessions, one client | `LoopState` keyed by request id |
| Qt 5 compatibility (0348ac8) | runner uses only signals/`QHash`/`std::function` — no Qt 6-only API |
| Cancel mid-tool-execution | unchanged: `cancelRequest` → `failRequest` → `onRequestClosed` clears state; `ToolsManager::cleanupRequest` handles in-flight tools |
| Google (model in URL) | `continueRequest` reuses stored per-request url/mode — same as today |
## 6. Deliberately not doing
- Not moving the loop or tool execution out of llmqore
(`feedback_llmqore_boundary`).
- Not touching the 11 `executeToolsFromMessage` call sites or the protocol
`buildContinuationPayload` implementations.
- No Auto/Manual mode flags.
- Track 2 is not started without an explicit decision.