doc: update architecture

2026-06-14 02:09:22 -04:00 · 2026-06-11 15:28:37 +02:00
parent 69672deb45
commit 231a6a0215
1 changed files with 224 additions and 175 deletions
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -1,35 +1,42 @@
 # QodeAssist Architecture
-This document describes the runtime architecture of QodeAssist after the
+This document describes the **current** runtime architecture, after the §10
-migration of all LLM runtime paths onto the agent / `Session` stack
+rework in `target-architecture.md` was completed. Every runtime LLM path —
-("Stack B"). Every runtime LLM path — code completion, chat (send/stream +
+code completion, chat (send/stream + compression + token counting), and quick
-compression + token counting), and quick refactor — now goes through agents,
+refactor — flows through one stack: agents, `Session`, and the
-`Session`, and the `Providers::GenericProvider` layer.
+`Providers::GenericProvider` layer. There is no legacy parallel path; the old
 "Stack A" (root `providers/*`, `pluginllmcore/*`, `ConfigurationManager`, the
 provider/model/template settings pages) has been removed.
-> Legend: ✅ = on Stack B (active runtime), 🔴 = legacy Stack A (isolated, no
+For the design rationale, layering contract, and cross-cutting policies, see
-> runtime consumers left).
+[`target-architecture.md`](target-architecture.md). This file documents how the
 code is wired today.
 ---
 ## 1. Top level: ownership and dependency injection
-The plugin (`qodeassist.cpp`) owns everything via `new` + parent (no plugin-wide
+The plugin (`qodeassist.cpp`) owns everything via `new` + parent — no
-singletons; each feature receives its dependencies explicitly).
+plugin-wide singletons; each feature receives its dependencies explicitly.
 ```
 QodeAssistPlugin
-  Stack B infrastructure:
+    • Providers::registerBuiltinProviders()   — client_api → provider table
-    • Providers::registerBuiltinProviders()   — registers 13 client_api types
+    • ProviderInstanceFactory                 — provider instances from TOML
-    • ProviderInstanceFactory                 — 14 instances from TOML
+    • ProviderSecretsStore                    — secrets behind a port
-    • ProviderSecretsStore
+    • AgentFactory                            — agents from TOML + agent_models.json
-    • AgentFactory                            — agents from TOML
+    • SessionManager(agentFactory)            — owns the ToolContributorRegistry
-    • SessionManager(agentFactory)
+        toolContributors().add(registerQodeAssistTools)
        toolContributors().add(registerSkillTool)
        toolContributors().add(McpClientsManager::registerToolsOn)
    • m_engine (QQmlEngine)
        rootContext: "agentFactory", "sessionManager"   — DI for chat (QML)
  Wired into consumers:
-    • QodeAssistClient ← LLMClientInterface(*sessionManager, *agentFactory)
+    • QodeAssistClient ← LLMClientInterface(generalSettings, completeSettings,
-                       ← setSessionManager / setAgentFactory   (for quick refactor)
+                            agentFactory, sessionManager, documentReader,
                            performanceLogger)
                       ← setSessionManager / setAgentFactory   (quick refactor)
 ```
 Chat lives in QML (`ChatRootView` is a `QML_ELEMENT`), so `AgentFactory` and
@@ -39,220 +46,262 @@ context** and resolved in `ChatRootView` via
 ---
-## 2. Stack B core (agent / Session)
+## 2. Core (agent / Session)
 ```
 AgentFactory.create(name)
-  configByName(name) → AgentConfig (TOML)
+  configByName(name) → AgentConfig (TOML, [body] table; model override from
-     providerInstance, model, endpoint, role, messageFormat,
+                       agent_models.json applied here)
     sampling, enableTools, enableThinking, match{filePatterns,...}
  buildProviderForAgent:
     instance = ProviderInstanceFactory.instanceByName(cfg.providerInstance)
-     provider = ProviderFactory::create(instance.clientApi)        ◄── keystone
+     provider = ProviderFactory::create(instance.clientApi)
     provider.setUrl(instance.url)
     provider.setApiKey(secrets.read(instance.apiKeyRef))
  ▼
 Agent(config, provider)
-  promptTemplate = JsonPromptTemplate::fromConfig(cfg.messageFormat)   (inja)
+  promptTemplate = JsonPromptTemplate::fromConfig(cfg)   — compiles [body] (inja),
                   validated at load against a synthetic context
  provider.setPromptCaching(cfg.cachePrompt, cfg.cacheTtl == "1h")
  ▼
-SessionManager.createSession(agentName) → Session(agent)
+SessionManager — two ways to obtain a Session:
-  ├─ ConversationHistory     — messages as ContentBlocks
+  • createSession(agentName, externalHistory?)  — chat: attaches a persistent,
-  ├─ SystemPromptBuilder     — layers: agent.role + caller layers
+                                                  externally-owned history
-  └─ ResponseRouter(client)  — emits ResponseEvent
+  • acquire(agentName) / release(session)       — one-shot pipelines: a small
                                                  per-agent pool of internal-history
                                                  sessions; acquire hands out a
                                                  session with cleared history,
                                                  cleared system-prompt layers and
                                                  cleared client tools
  ▼
 Session(agent[, externalHistory])
  ├─ ConversationHistory     — messages as polymorphic ContentBlocks
  ├─ SystemPromptBuilder     — ordered named layers (priority-sorted)
  └─ ResponseRouter(client)  — adapts client signals → typed ResponseEvent
 Session API:
-  • send(blocks, toolsOverride)   — chat/refactor: append user msg + dispatch
+  • send(blocks, toolsOverride)   — the ONLY dispatch entry point: append a user
-  • sendCompletion(ContextData)   — completion: FIM prefix/suffix
+                                    message and dispatch. Completion/chat/refactor
-  • client()                      — agent's LLMQore::BaseClient (direct streaming)
+                                    differ only in block content + template.
-  • systemPrompt()->setLayer(...) — dynamic context layers
+  • cancel()                      — tears down in-flight; emits cancelled(id)
-  • supportsImages()              — provider Image capability
+  • history() / systemPrompt() / client() / supportsImages()
-  • history()                     — for seeding from ChatModel
+  • setContentLoader(loader)      — resolves Stored* attachment/image blocks
  • lastError() → ErrorInfo       — typed synchronous start-failure detail
 Session signals (three-state, mutually exclusive per request):
  • finished(id, stopReason)
  • failed(id, ErrorInfo{category, message, providerDetail})
  • cancelled(id)
  + event(ResponseEvent)          — live delta stream for the chat UI
 ```
-`Session::sendCompletion` and `dispatch` compose `SystemPromptBuilder` layers
+`Session::dispatch` renders the agent's `system_prompt` into the `agent.system`
-(`agent.role` + caller-provided) into the request system prompt.
+layer, composes all `SystemPromptBuilder` layers into the request system prompt,
 and substitutes `${MODEL}` in the endpoint before sending.
 ---
-## 3. Provider layer — the keystone (implemented during migration)
+## 3. Provider layer
-The Stack B provider layer previously existed only as an abstract base +
+One configuration-driven `GenericProvider` covers every API; it varies only by
-empty factory (`registerType` was never called, no concrete providers). This
+the LLMQore client factory and metadata. Request *shape* belongs to the agent's
-blocked every agent from obtaining a working provider. It is now implemented
+`JsonPromptTemplate` (the `[body]` table), never to the provider.
 via a single configuration-driven `GenericProvider`.
 ```
 ProviderFactory  (sources/providers, namespace functions)
   registerType(name, fn) / create(name, parent) / knownNames()
-        ▲
+        ▲  registerBuiltinProviders()   — client_api → provider table
        │ registerBuiltinProviders()   — client_api → provider table
        │
 GenericProvider : Providers::Provider
   • owns an LLMQore::BaseClient (created by a ClientFactory)
-   • prepareRequest — inherited from Provider base:
+   • prepareRequest → PromptTemplate::buildFullRequest; injects tools when
-        delegates to PromptTemplate::buildFullRequest
+     enable_tools; applies ClaudeCacheControl when prompt caching is on
   • client() / providerID() / capabilities() / getInstalledModels()
 ```
 ### client_api → provider table
-| client_api                     | LLMQore client          | ProviderID       | capabilities            |
+| client_api                   | LLMQore client        | ProviderID       | capabilities                      |
-|--------------------------------|-------------------------|------------------|-------------------------|
+|------------------------------|-----------------------|------------------|-----------------------------------|
-| Claude                         | ClaudeClient            | Claude           | Tools·Thinking·Image·ModelListing |
+| Claude                       | ClaudeClient          | Claude           | Tools·Thinking·Image·ModelListing |
-| Google AI                      | GoogleAIClient          | GoogleAI         | Tools·Thinking·Image·ModelListing |
+| Google AI                    | GoogleAIClient        | GoogleAI         | Tools·Thinking·Image·ModelListing |
-| llama.cpp                      | LlamaCppClient          | LlamaCpp         | Tools·Thinking·Image·ModelListing |
+| llama.cpp                    | LlamaCppClient        | LlamaCpp         | Tools·Thinking·Image·ModelListing |
-| Mistral AI                     | MistralClient           | MistralAI        | Tools·Thinking·Image·ModelListing |
+| Mistral AI                   | MistralClient         | MistralAI        | Tools·Thinking·Image·ModelListing |
-| Codestral                      | MistralClient           | MistralAI        | Tools·Image             |
+| Codestral                    | MistralClient         | MistralAI        | Tools·Image                       |
-| Ollama (Native)                | OllamaClient            | Ollama           | Tools·Thinking·Image·ModelListing |
+| Ollama (Native)              | OllamaClient          | Ollama           | Tools·Thinking·Image·ModelListing |
-| Ollama (OpenAI-compatible)     | OpenAIClient            | OpenAICompatible | Tools·Thinking·Image·ModelListing |
+| Ollama (OpenAI-compatible)   | OpenAIClient          | OpenAICompatible | Tools·Thinking·Image·ModelListing |
-| OpenAI (Chat Completions)      | OpenAIClient            | OpenAI           | Tools·Thinking·Image·ModelListing |
+| OpenAI (Chat Completions)    | OpenAIClient          | OpenAI           | Tools·Thinking·Image·ModelListing |
-| OpenAI (Responses API)         | OpenAIResponsesClient   | OpenAIResponses  | Tools·Thinking·Image·ModelListing |
+| OpenAI (Responses API)       | OpenAIResponsesClient | OpenAIResponses  | Tools·Thinking·Image·ModelListing |
-| OpenAI Compatible              | OpenAIClient            | OpenAICompatible | Tools·Image·Thinking    |
+| OpenAI Compatible            | OpenAIClient          | OpenAICompatible | Tools·Image·Thinking              |
-| OpenRouter                     | OpenAIClient            | OpenRouter       | Tools·Image·Thinking·ModelListing |
+| OpenRouter                   | OpenAIClient          | OpenRouter       | Tools·Image·Thinking·ModelListing |
-| LM Studio (Chat Completions)   | OpenAIClient            | LMStudio         | Tools·Thinking·Image·ModelListing |
+| LM Studio (Chat Completions) | OpenAIClient          | LMStudio         | Tools·Thinking·Image·ModelListing |
-| LM Studio (Responses API)      | OpenAIResponsesClient   | OpenAIResponses  | Tools·Thinking·Image·ModelListing |
+| LM Studio (Responses API)    | OpenAIResponsesClient | OpenAIResponses  | Tools·Thinking·Image·ModelListing |
 Request *shape* comes from the agent's prompt template (jinja `messageFormat`),
 so a single provider class covers every API by varying only the client factory
 and metadata.
 ---
-## 4. Runtime paths (all on Stack B)
+## 4. Configuration model
 ### 4a. Code completion ✅
 ```
 Qt Creator LSP (getCompletionsCycling)
  ▼
 LLMClientInterface
  pickCompletionAgent: AgentRouter.pickAgent(roster.codeCompletion, {file, project})
  session = sessionManager.createSession(agent)
  ctx = Templates::ContextData{ prefix, suffix,
                                systemPrompt = fileContext + openFiles }
  session.sendCompletion(ctx)
     ▼ stream from session.client():
  requestCompleted → sendCompletionToClient → CodeHandler → LSP
  system prompt = agent.role; FIM template renders prefix/suffix
 ```
 ### 4b. Chat ✅
 ```
 ChatRootView (QML)
  resolve agentFactory()/sessionManager() = qmlEngine(this)->rootContext()
  ChatAgentController: agent list (configNames), active agent (persisted),
                       supportsThinking/Tools
  QML agent picker (TopBar.agentSelector) — replaced provider/model/template combos
  ▼ dispatchSend
 ClientInterface
  session = sessionManager.createSession(currentChatAgent)
  registerQodeAssistTools(session.client().tools()) + registerSkillTool
  systemPrompt layer "chat.context" = project info + skills + linked files
  seedHistory(session.history() ← ChatModel: user/assistant/tool-call+result)
  session.send(userBlocks{text + images}, useTools)
     ▼ stream from session.client() → existing handlers → ChatModel:
  chunk→addMessage  thinking→addThinkingBlock
  tool→addToolExecutionStatus / updateToolResult
  finalized→usage   completed→messageReceivedCompletely → removeSession
 ChatCompressor    → createSession(agent) → seed history → layer "compression" → send(prompt)
 InputTokenCounter → estimate without provider (calibrated by server usage)
 ```
 ### 4c. Quick refactor ✅
 ```
 QodeAssistClient.requestQuickRefactor → QuickRefactorHandler (setSessionManager/setAgentFactory)
  pickRefactorAgent: AgentRouter.pickAgent(roster.quickRefactor, {file, project})
  session = createSession(agent)
  if useTools: registerQodeAssistTools(session.client().tools())
  systemPrompt layer "refactor" = buildSystemPrompt(tagged content +
                                  output requirements + indentation rules)
  session.send(blocks{instructions}, useTools)
     ▼ stream from session.client():
  requestCompleted → ResponseCleaner → RefactorResult → insert into editor
 ```
 ---
 ## 5. Configuration sources
 ```
 ~/.config/.../qodeassist/config/
  providers/*.toml   → ProviderInstance { name, client_api, url, api_key_ref }
-  agents/*.toml      → AgentConfig { providerInstance, model, endpoint, role,
+  agents/*.toml      → AgentConfig { schema_version, providerInstance, model,
-                                     messageFormat, sampling, match, enable* }
+                                     endpoint, system_prompt, [body], match,
                                     enable_tools, enable_thinking, cache_prompt,
                                     extends, abstract, hidden, tags }
  agent_models.json  → per-agent model override (applied by AgentFactory)
  agent_roles/*.json → role text, pulled into system_prompt via {{ agent_role(id) }}
  pipelines rosters  → codeCompletion / chatAssistant / chatCompression / quickRefactor
                       consumed by AgentRouter.pickAgent(roster, {filePath, projectName})
 Editor policy (NOT agent config):
  CodeCompletionSettings — triggers, modelOutputHandler, context extraction,
                           useOpenFilesContext
-                           (sampling / prompt-generation fields removed)
+```
 `[body]` **is** the request body (deep-mergeable through `extends`; Jinja-bearing
 string values render and splice as raw JSON, literals pass through, empty renders
 drop the key). `include` resolves only sandboxed partial roots. Profiles validate
 at load: a referenced partial must resolve and the assembled body must parse as
 JSON against a synthetic context — config errors surface in the agents settings
 page, never as a silent runtime drop. Full spec:
 [`agent-templates-design.md`](agent-templates-design.md).
 ---
 ## 5. Runtime paths
 `AgentRouter.pickAgent(roster, {file, project})` is the only agent picker; every
 pipeline resolves its agent through a roster.
 ### 5a. Code completion
 ```
 Qt Creator LSP (getCompletionsCycling)
  ▼
 LLMClientInterface
  agent   = AgentRouter.pickAgent(roster.codeCompletion, {file, project})
  session = sessionManager.acquire(agent)                 — pooled
  systemPrompt layer "completion.context" = fileContext + open-files context
  session.send( blocks{ CompletionContent(prefix, suffix) }, tools=off )
     ▼ on Session::finished:
  history().lastAssistantText() → CodeHandler (output-mode) → LSP items
     → sessionManager.release(session)
 ```
 The completion context travels as a `CompletionContent` block; the template
 exposes it as `ctx.prefix` / `ctx.suffix`. FIM vs instruct is purely agent
 config (the body), not feature code. Completion never touches the delta stream —
 it waits for `finished` and reads the last message.
 ### 5b. Chat
 `ChatRootView` owns one persistent `ConversationHistory` for the whole chat view
 and injects it into every collaborator. **History is the single source of truth.**
 ```
 ChatRootView (QML)  — owns ConversationHistory m_history
  ChatModel.setHistory(m_history)          — ChatModel is a PROJECTION:
        subscribes to messageAdded/Updated/cleared/reset, flattens blocks→rows,
        overlays file-edit status from ChangesManager, holds a per-message usage map
  ChatAgentController                       — agent list filtered to the
        chatAssistant roster; active agent persisted
  ▼ dispatchSend
 ClientInterface
  session = sessionManager.createSession(activeAgent, m_history)
  sessionManager.toolContributors().contribute(client.tools())   — builtin+skills+MCP
  session.setContentLoader(ChatSerializer::loadContentFromStorage)
  systemPrompt layer "chat.context" = project info + skills + linked files
  session.send( blocks{ TextContent + StoredAttachmentContent + StoredImageContent } )
     ▼ consumes Session signals (NOT raw client signals):
  event(Usage)        → ChatModel.setMessageUsage + token-counter calibration
  finished(id)        → ChangesManager.applyPendingEditsForRequest + persist;
                        removeSession (the persistent history survives)
  failed(id, ErrorInfo) → surface error; removeSession
 ChatCompressor    → acquire(chatCompression-roster agent) → seed history from the
                    chat's messages → "compression" layer → send → read summary from
                    the compression session's own history → release
 InputTokenCounter → estimates over ConversationHistory (calibrated by Usage events)
 ChatSerializer    → persists ConversationHistory via MessageSerializer (v0.3);
                    imports legacy v0.1/v0.2 files
 ```
 `ChatModel`'s QML role surface (roleType / content / attachments / images /
 isRedacted / token roles) is unchanged, so the QML delegates were untouched. The
 projection's incremental updates avoid model resets on the streaming hot path.
 ### 5c. Quick refactor
 ```
 QodeAssistClient.requestQuickRefactor → QuickRefactorHandler
  agent   = AgentRouter.pickAgent(roster.quickRefactor, {file, project})
  session = sessionManager.acquire(agent)
  if useTools: sessionManager.toolContributors().contribute(client.tools())
  systemPrompt layer "refactor" = tagged selection + output + indentation rules
  session.send(blocks{instructions}, useTools)
     ▼ on Session::finished:
  history().lastAssistantText() → ResponseCleaner → RefactorResult → editor insert
     → sessionManager.release(session)
  on Session::failed(ErrorInfo) → RefactorResult{error}
 ```
 ---
-## 6. Remaining Stack A (runtime does NOT depend on it)
+## 6. Context layer
 The context services sit behind IDE-agnostic ports; Qt Creator API use lives in
 the adapters.
 ```
-🔴 Settings UI: provider/model/template selection pages
+EditorContext   — IDocumentReader (port)  ← DocumentReaderQtCreator (TextEditor API)
-                (ccProvider / caProvider / qrProvider) + ConfigurationManager
+ProjectContext  — IProjectScanner (port)  ← ProjectScannerQtCreator (ProjectExplorer
-                → use ProvidersManager
+                  + Core::DocumentModel + the IgnoreManager for .qodeassistignore)
-🔴 root providers/*  (PluginLLMCore::Provider, 14 classes)
+TokenEstimator  — TokenUtils (pure)       ← InputTokenCounter (thin UI consumer)
                → read only chat/quick-refactor sampling settings
 🔴 pluginllmcore/*   (ProvidersManager, PromptTemplateManager, ResponseCleaner,
                      PromptProviderChat/Fim, ContextData)
 🔴 qodeassist.cpp:144-146  registerProviders() / registerTemplates()  (Stack A registration)
 🔴 qodeassist.cpp:185      MCP skill-tool loop on Stack A providers  (effectively dead)
 🔴 ChatAssistantSettings / QuickRefactorSettings — sampling fields (read only by root providers)
 ResponseCleaner (pluginllmcore) is still used by QuickRefactorHandler as a text
 utility — orthogonal to the provider stack.
 ```
-### Removed during the migration
+`ContextManager` is now Qt-Creator-free: it delegates open-file enumeration and
-
+ignore filtering to an injected `IProjectScanner` (defaulting to the QtC adapter),
- Rules subsystem (`RulesLoader` + chat "active rules" UI + QuickRefactor rules block)
+and keeps only filesystem reads + formatting. `ContextManager::shouldIgnore(path)`
- `ChatConfigurationController`, `AgentRoleController` (chat config/role presets)
+replaced the previously exposed `ignoreManager()`.
 - `m_promptProvider` (`PromptProviderFim`) in the plugin
 - `RequestType::CodeCompletion` branch in all 14 root providers
 - Sampling / prompt-generation fields in `CodeCompletionSettings`
 - ChatView no longer links `PluginLLMCore`
 ---
-## 7. Dependency summary
+## 7. Cross-cutting
-```
+- **Request lifecycle** — a session has at most one in-flight request; `send()`
-                 ┌──────────────── Stack B (active runtime) ────────────────┐
+  while in flight cancels the previous. Every request ends in exactly one of
-LLMClientInterface ─┐                                                        │
+  `finished` / `failed` / `cancelled`. Cancellation is not an error; no consumer
-ClientInterface ────┼─► SessionManager ─► Session ─► Agent ─► GenericProvider ─► LLMQore::*Client
+  string-matches a message to tell them apart.
-QuickRefactorHandler─┘        │              │         │            │
+- **Typed errors** — `ErrorInfo { category ∈ {Config, Auth, Network, Provider,
-ChatCompressor ──────────────┘              │      AgentFactory  ProviderFactory
+  Validation, Tool}, message, providerDetail }`. `ResponseRouter` categorizes wire
-                                  AgentRouter (rosters)  │            │
+  errors (best-effort) at the boundary; `Session::failed` carries the typed value.
-                                                ProviderInstanceFactory (TOML)
+- **Tools** — `SessionManager` owns a `ToolContributorRegistry`; built-in ToolKit,
-                 └──────────────────────────────────────────────────────────┘
+  the skill tool, and MCP client tools register once and are contributed to chat
-
+  and quick-refactor session clients uniformly.
-   Stack A (settings UI + ConfigurationManager + MCP loop) — isolated,
+- **Threading** — the core runs on the GUI thread; concurrency is the Qt event
-   no runtime consumers remain.
+  loop plus async network I/O. Blocking work hides behind L3 ports.
 ```
 ---
-## 8. Open follow-ups (optional)
+## 8. Tests
-1. **Chat picker filtering** — show only `chatAssistant`-roster agents (currently
+`test/` (GTest + Qt::Test) covers the two engines most affected by the rework:
-   lists all non-hidden agents; the auto-default may land on a FIM agent).
+
-   Requires wiring ChatView to `PipelinesConfig` (watch for OBJECT-library
+- `JsonPromptTemplateTest` — the `[body]` engine: jinja render + JSON splice,
-   symbol duplication).
+  literal passthrough, empty-render key drop, nested literals, and load-time
-2. **MCP tools on agent clients** — MCP skill tools are registered only on Stack A
+  rejection of bodies that render invalid JSON.
-   providers; to expose MCP tools to chat agents, register them on the session
+- `ResponseRouterTest` — a fake `BaseClient` replays a recorded provider stream;
-   client alongside `registerQodeAssistTools`.
+  asserts the assistant message is stamped with the request id, history is built
-3. **Physical Stack A teardown** — remove the provider/model/template settings UI,
+  correctly (thinking + text + tool use/result), the typed event stream is emitted,
-   `ConfigurationManager`, root `providers/*`, `pluginllmcore/*`, and the
+  and wire errors are categorized.
-   registration + MCP loop in `qodeassist.cpp`. Runtime no longer depends on them.
+
-4. **Per-message session cost** — chat/refactor create a fresh agent/provider/client
+---
-   (and read secrets) per request; a session pool could reduce latency.
+
 ## 9. Remaining follow-ups (optional)
 1. **Qt-Creator-free core build + CI** — `AgentFactory` / `ContextRenderer` still
   call `Core::ICore::userResourcePath`, so the core targets link `QtCreator::Core`.
   A `ResourcePaths` port + adapter would let the core build without Qt Creator and
   enable a CI job that fails on a layering-violating include, plus golden
   rendered-body snapshots over the bundled agents loaded through the real loader.
 2. **§9 target module layout** — the `core/ ide/ features/ hosts/` physical target
   split in `target-architecture.md` is not yet reflected in the directory layout.
 ```