diff --git a/docs/architecture.md b/docs/architecture.md index 2dcd575..d0cef7b 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -1,35 +1,42 @@ # QodeAssist Architecture -This document describes the runtime architecture of QodeAssist after the -migration of all LLM runtime paths onto the agent / `Session` stack -("Stack B"). Every runtime LLM path — code completion, chat (send/stream + -compression + token counting), and quick refactor — now goes through agents, -`Session`, and the `Providers::GenericProvider` layer. +This document describes the **current** runtime architecture, after the §10 +rework in `target-architecture.md` was completed. Every runtime LLM path — +code completion, chat (send/stream + compression + token counting), and quick +refactor — flows through one stack: agents, `Session`, and the +`Providers::GenericProvider` layer. There is no legacy parallel path; the old +"Stack A" (root `providers/*`, `pluginllmcore/*`, `ConfigurationManager`, the +provider/model/template settings pages) has been removed. -> Legend: ✅ = on Stack B (active runtime), 🔴 = legacy Stack A (isolated, no -> runtime consumers left). +For the design rationale, layering contract, and cross-cutting policies, see +[`target-architecture.md`](target-architecture.md). This file documents how the +code is wired today. --- ## 1. Top level: ownership and dependency injection -The plugin (`qodeassist.cpp`) owns everything via `new` + parent (no plugin-wide -singletons; each feature receives its dependencies explicitly). +The plugin (`qodeassist.cpp`) owns everything via `new` + parent — no +plugin-wide singletons; each feature receives its dependencies explicitly. ``` QodeAssistPlugin - Stack B infrastructure: - • Providers::registerBuiltinProviders() — registers 13 client_api types - • ProviderInstanceFactory — 14 instances from TOML - • ProviderSecretsStore - • AgentFactory — agents from TOML - • SessionManager(agentFactory) + • Providers::registerBuiltinProviders() — client_api → provider table + • ProviderInstanceFactory — provider instances from TOML + • ProviderSecretsStore — secrets behind a port + • AgentFactory — agents from TOML + agent_models.json + • SessionManager(agentFactory) — owns the ToolContributorRegistry + toolContributors().add(registerQodeAssistTools) + toolContributors().add(registerSkillTool) + toolContributors().add(McpClientsManager::registerToolsOn) • m_engine (QQmlEngine) rootContext: "agentFactory", "sessionManager" — DI for chat (QML) Wired into consumers: - • QodeAssistClient ← LLMClientInterface(*sessionManager, *agentFactory) - ← setSessionManager / setAgentFactory (for quick refactor) + • QodeAssistClient ← LLMClientInterface(generalSettings, completeSettings, + agentFactory, sessionManager, documentReader, + performanceLogger) + ← setSessionManager / setAgentFactory (quick refactor) ``` Chat lives in QML (`ChatRootView` is a `QML_ELEMENT`), so `AgentFactory` and @@ -39,220 +46,262 @@ context** and resolved in `ChatRootView` via --- -## 2. Stack B core (agent / Session) +## 2. Core (agent / Session) ``` AgentFactory.create(name) - configByName(name) → AgentConfig (TOML) - providerInstance, model, endpoint, role, messageFormat, - sampling, enableTools, enableThinking, match{filePatterns,...} + configByName(name) → AgentConfig (TOML, [body] table; model override from + agent_models.json applied here) buildProviderForAgent: instance = ProviderInstanceFactory.instanceByName(cfg.providerInstance) - provider = ProviderFactory::create(instance.clientApi) ◄── keystone + provider = ProviderFactory::create(instance.clientApi) provider.setUrl(instance.url) provider.setApiKey(secrets.read(instance.apiKeyRef)) ▼ Agent(config, provider) - promptTemplate = JsonPromptTemplate::fromConfig(cfg.messageFormat) (inja) + promptTemplate = JsonPromptTemplate::fromConfig(cfg) — compiles [body] (inja), + validated at load against a synthetic context + provider.setPromptCaching(cfg.cachePrompt, cfg.cacheTtl == "1h") ▼ -SessionManager.createSession(agentName) → Session(agent) - ├─ ConversationHistory — messages as ContentBlocks - ├─ SystemPromptBuilder — layers: agent.role + caller layers - └─ ResponseRouter(client) — emits ResponseEvent +SessionManager — two ways to obtain a Session: + • createSession(agentName, externalHistory?) — chat: attaches a persistent, + externally-owned history + • acquire(agentName) / release(session) — one-shot pipelines: a small + per-agent pool of internal-history + sessions; acquire hands out a + session with cleared history, + cleared system-prompt layers and + cleared client tools + ▼ +Session(agent[, externalHistory]) + ├─ ConversationHistory — messages as polymorphic ContentBlocks + ├─ SystemPromptBuilder — ordered named layers (priority-sorted) + └─ ResponseRouter(client) — adapts client signals → typed ResponseEvent Session API: - • send(blocks, toolsOverride) — chat/refactor: append user msg + dispatch - • sendCompletion(ContextData) — completion: FIM prefix/suffix - • client() — agent's LLMQore::BaseClient (direct streaming) - • systemPrompt()->setLayer(...) — dynamic context layers - • supportsImages() — provider Image capability - • history() — for seeding from ChatModel + • send(blocks, toolsOverride) — the ONLY dispatch entry point: append a user + message and dispatch. Completion/chat/refactor + differ only in block content + template. + • cancel() — tears down in-flight; emits cancelled(id) + • history() / systemPrompt() / client() / supportsImages() + • setContentLoader(loader) — resolves Stored* attachment/image blocks + • lastError() → ErrorInfo — typed synchronous start-failure detail + +Session signals (three-state, mutually exclusive per request): + • finished(id, stopReason) + • failed(id, ErrorInfo{category, message, providerDetail}) + • cancelled(id) + + event(ResponseEvent) — live delta stream for the chat UI ``` -`Session::sendCompletion` and `dispatch` compose `SystemPromptBuilder` layers -(`agent.role` + caller-provided) into the request system prompt. +`Session::dispatch` renders the agent's `system_prompt` into the `agent.system` +layer, composes all `SystemPromptBuilder` layers into the request system prompt, +and substitutes `${MODEL}` in the endpoint before sending. --- -## 3. Provider layer — the keystone (implemented during migration) +## 3. Provider layer -The Stack B provider layer previously existed only as an abstract base + -empty factory (`registerType` was never called, no concrete providers). This -blocked every agent from obtaining a working provider. It is now implemented -via a single configuration-driven `GenericProvider`. +One configuration-driven `GenericProvider` covers every API; it varies only by +the LLMQore client factory and metadata. Request *shape* belongs to the agent's +`JsonPromptTemplate` (the `[body]` table), never to the provider. ``` ProviderFactory (sources/providers, namespace functions) registerType(name, fn) / create(name, parent) / knownNames() - ▲ - │ registerBuiltinProviders() — client_api → provider table - │ + ▲ registerBuiltinProviders() — client_api → provider table GenericProvider : Providers::Provider • owns an LLMQore::BaseClient (created by a ClientFactory) - • prepareRequest — inherited from Provider base: - delegates to PromptTemplate::buildFullRequest + • prepareRequest → PromptTemplate::buildFullRequest; injects tools when + enable_tools; applies ClaudeCacheControl when prompt caching is on • client() / providerID() / capabilities() / getInstalledModels() ``` ### client_api → provider table -| client_api | LLMQore client | ProviderID | capabilities | -|--------------------------------|-------------------------|------------------|-------------------------| -| Claude | ClaudeClient | Claude | Tools·Thinking·Image·ModelListing | -| Google AI | GoogleAIClient | GoogleAI | Tools·Thinking·Image·ModelListing | -| llama.cpp | LlamaCppClient | LlamaCpp | Tools·Thinking·Image·ModelListing | -| Mistral AI | MistralClient | MistralAI | Tools·Thinking·Image·ModelListing | -| Codestral | MistralClient | MistralAI | Tools·Image | -| Ollama (Native) | OllamaClient | Ollama | Tools·Thinking·Image·ModelListing | -| Ollama (OpenAI-compatible) | OpenAIClient | OpenAICompatible | Tools·Thinking·Image·ModelListing | -| OpenAI (Chat Completions) | OpenAIClient | OpenAI | Tools·Thinking·Image·ModelListing | -| OpenAI (Responses API) | OpenAIResponsesClient | OpenAIResponses | Tools·Thinking·Image·ModelListing | -| OpenAI Compatible | OpenAIClient | OpenAICompatible | Tools·Image·Thinking | -| OpenRouter | OpenAIClient | OpenRouter | Tools·Image·Thinking·ModelListing | -| LM Studio (Chat Completions) | OpenAIClient | LMStudio | Tools·Thinking·Image·ModelListing | -| LM Studio (Responses API) | OpenAIResponsesClient | OpenAIResponses | Tools·Thinking·Image·ModelListing | - -Request *shape* comes from the agent's prompt template (jinja `messageFormat`), -so a single provider class covers every API by varying only the client factory -and metadata. +| client_api | LLMQore client | ProviderID | capabilities | +|------------------------------|-----------------------|------------------|-----------------------------------| +| Claude | ClaudeClient | Claude | Tools·Thinking·Image·ModelListing | +| Google AI | GoogleAIClient | GoogleAI | Tools·Thinking·Image·ModelListing | +| llama.cpp | LlamaCppClient | LlamaCpp | Tools·Thinking·Image·ModelListing | +| Mistral AI | MistralClient | MistralAI | Tools·Thinking·Image·ModelListing | +| Codestral | MistralClient | MistralAI | Tools·Image | +| Ollama (Native) | OllamaClient | Ollama | Tools·Thinking·Image·ModelListing | +| Ollama (OpenAI-compatible) | OpenAIClient | OpenAICompatible | Tools·Thinking·Image·ModelListing | +| OpenAI (Chat Completions) | OpenAIClient | OpenAI | Tools·Thinking·Image·ModelListing | +| OpenAI (Responses API) | OpenAIResponsesClient | OpenAIResponses | Tools·Thinking·Image·ModelListing | +| OpenAI Compatible | OpenAIClient | OpenAICompatible | Tools·Image·Thinking | +| OpenRouter | OpenAIClient | OpenRouter | Tools·Image·Thinking·ModelListing | +| LM Studio (Chat Completions) | OpenAIClient | LMStudio | Tools·Thinking·Image·ModelListing | +| LM Studio (Responses API) | OpenAIResponsesClient | OpenAIResponses | Tools·Thinking·Image·ModelListing | --- -## 4. Runtime paths (all on Stack B) - -### 4a. Code completion ✅ - -``` -Qt Creator LSP (getCompletionsCycling) - ▼ -LLMClientInterface - pickCompletionAgent: AgentRouter.pickAgent(roster.codeCompletion, {file, project}) - session = sessionManager.createSession(agent) - ctx = Templates::ContextData{ prefix, suffix, - systemPrompt = fileContext + openFiles } - session.sendCompletion(ctx) - ▼ stream from session.client(): - requestCompleted → sendCompletionToClient → CodeHandler → LSP - system prompt = agent.role; FIM template renders prefix/suffix -``` - -### 4b. Chat ✅ - -``` -ChatRootView (QML) - resolve agentFactory()/sessionManager() = qmlEngine(this)->rootContext() - ChatAgentController: agent list (configNames), active agent (persisted), - supportsThinking/Tools - QML agent picker (TopBar.agentSelector) — replaced provider/model/template combos - ▼ dispatchSend -ClientInterface - session = sessionManager.createSession(currentChatAgent) - registerQodeAssistTools(session.client().tools()) + registerSkillTool - systemPrompt layer "chat.context" = project info + skills + linked files - seedHistory(session.history() ← ChatModel: user/assistant/tool-call+result) - session.send(userBlocks{text + images}, useTools) - ▼ stream from session.client() → existing handlers → ChatModel: - chunk→addMessage thinking→addThinkingBlock - tool→addToolExecutionStatus / updateToolResult - finalized→usage completed→messageReceivedCompletely → removeSession - -ChatCompressor → createSession(agent) → seed history → layer "compression" → send(prompt) -InputTokenCounter → estimate without provider (calibrated by server usage) -``` - -### 4c. Quick refactor ✅ - -``` -QodeAssistClient.requestQuickRefactor → QuickRefactorHandler (setSessionManager/setAgentFactory) - pickRefactorAgent: AgentRouter.pickAgent(roster.quickRefactor, {file, project}) - session = createSession(agent) - if useTools: registerQodeAssistTools(session.client().tools()) - systemPrompt layer "refactor" = buildSystemPrompt(tagged content + - output requirements + indentation rules) - session.send(blocks{instructions}, useTools) - ▼ stream from session.client(): - requestCompleted → ResponseCleaner → RefactorResult → insert into editor -``` - ---- - -## 5. Configuration sources +## 4. Configuration model ``` ~/.config/.../qodeassist/config/ providers/*.toml → ProviderInstance { name, client_api, url, api_key_ref } - agents/*.toml → AgentConfig { providerInstance, model, endpoint, role, - messageFormat, sampling, match, enable* } + agents/*.toml → AgentConfig { schema_version, providerInstance, model, + endpoint, system_prompt, [body], match, + enable_tools, enable_thinking, cache_prompt, + extends, abstract, hidden, tags } + agent_models.json → per-agent model override (applied by AgentFactory) + agent_roles/*.json → role text, pulled into system_prompt via {{ agent_role(id) }} pipelines rosters → codeCompletion / chatAssistant / chatCompression / quickRefactor consumed by AgentRouter.pickAgent(roster, {filePath, projectName}) Editor policy (NOT agent config): CodeCompletionSettings — triggers, modelOutputHandler, context extraction, useOpenFilesContext - (sampling / prompt-generation fields removed) +``` + +`[body]` **is** the request body (deep-mergeable through `extends`; Jinja-bearing +string values render and splice as raw JSON, literals pass through, empty renders +drop the key). `include` resolves only sandboxed partial roots. Profiles validate +at load: a referenced partial must resolve and the assembled body must parse as +JSON against a synthetic context — config errors surface in the agents settings +page, never as a silent runtime drop. Full spec: +[`agent-templates-design.md`](agent-templates-design.md). + +--- + +## 5. Runtime paths + +`AgentRouter.pickAgent(roster, {file, project})` is the only agent picker; every +pipeline resolves its agent through a roster. + +### 5a. Code completion + +``` +Qt Creator LSP (getCompletionsCycling) + ▼ +LLMClientInterface + agent = AgentRouter.pickAgent(roster.codeCompletion, {file, project}) + session = sessionManager.acquire(agent) — pooled + systemPrompt layer "completion.context" = fileContext + open-files context + session.send( blocks{ CompletionContent(prefix, suffix) }, tools=off ) + ▼ on Session::finished: + history().lastAssistantText() → CodeHandler (output-mode) → LSP items + → sessionManager.release(session) +``` + +The completion context travels as a `CompletionContent` block; the template +exposes it as `ctx.prefix` / `ctx.suffix`. FIM vs instruct is purely agent +config (the body), not feature code. Completion never touches the delta stream — +it waits for `finished` and reads the last message. + +### 5b. Chat + +`ChatRootView` owns one persistent `ConversationHistory` for the whole chat view +and injects it into every collaborator. **History is the single source of truth.** + +``` +ChatRootView (QML) — owns ConversationHistory m_history + ChatModel.setHistory(m_history) — ChatModel is a PROJECTION: + subscribes to messageAdded/Updated/cleared/reset, flattens blocks→rows, + overlays file-edit status from ChangesManager, holds a per-message usage map + ChatAgentController — agent list filtered to the + chatAssistant roster; active agent persisted + ▼ dispatchSend +ClientInterface + session = sessionManager.createSession(activeAgent, m_history) + sessionManager.toolContributors().contribute(client.tools()) — builtin+skills+MCP + session.setContentLoader(ChatSerializer::loadContentFromStorage) + systemPrompt layer "chat.context" = project info + skills + linked files + session.send( blocks{ TextContent + StoredAttachmentContent + StoredImageContent } ) + ▼ consumes Session signals (NOT raw client signals): + event(Usage) → ChatModel.setMessageUsage + token-counter calibration + finished(id) → ChangesManager.applyPendingEditsForRequest + persist; + removeSession (the persistent history survives) + failed(id, ErrorInfo) → surface error; removeSession + +ChatCompressor → acquire(chatCompression-roster agent) → seed history from the + chat's messages → "compression" layer → send → read summary from + the compression session's own history → release +InputTokenCounter → estimates over ConversationHistory (calibrated by Usage events) +ChatSerializer → persists ConversationHistory via MessageSerializer (v0.3); + imports legacy v0.1/v0.2 files +``` + +`ChatModel`'s QML role surface (roleType / content / attachments / images / +isRedacted / token roles) is unchanged, so the QML delegates were untouched. The +projection's incremental updates avoid model resets on the streaming hot path. + +### 5c. Quick refactor + +``` +QodeAssistClient.requestQuickRefactor → QuickRefactorHandler + agent = AgentRouter.pickAgent(roster.quickRefactor, {file, project}) + session = sessionManager.acquire(agent) + if useTools: sessionManager.toolContributors().contribute(client.tools()) + systemPrompt layer "refactor" = tagged selection + output + indentation rules + session.send(blocks{instructions}, useTools) + ▼ on Session::finished: + history().lastAssistantText() → ResponseCleaner → RefactorResult → editor insert + → sessionManager.release(session) + on Session::failed(ErrorInfo) → RefactorResult{error} ``` --- -## 6. Remaining Stack A (runtime does NOT depend on it) +## 6. Context layer + +The context services sit behind IDE-agnostic ports; Qt Creator API use lives in +the adapters. ``` -🔴 Settings UI: provider/model/template selection pages - (ccProvider / caProvider / qrProvider) + ConfigurationManager - → use ProvidersManager -🔴 root providers/* (PluginLLMCore::Provider, 14 classes) - → read only chat/quick-refactor sampling settings -🔴 pluginllmcore/* (ProvidersManager, PromptTemplateManager, ResponseCleaner, - PromptProviderChat/Fim, ContextData) -🔴 qodeassist.cpp:144-146 registerProviders() / registerTemplates() (Stack A registration) -🔴 qodeassist.cpp:185 MCP skill-tool loop on Stack A providers (effectively dead) -🔴 ChatAssistantSettings / QuickRefactorSettings — sampling fields (read only by root providers) - -ResponseCleaner (pluginllmcore) is still used by QuickRefactorHandler as a text -utility — orthogonal to the provider stack. +EditorContext — IDocumentReader (port) ← DocumentReaderQtCreator (TextEditor API) +ProjectContext — IProjectScanner (port) ← ProjectScannerQtCreator (ProjectExplorer + + Core::DocumentModel + the IgnoreManager for .qodeassistignore) +TokenEstimator — TokenUtils (pure) ← InputTokenCounter (thin UI consumer) ``` -### Removed during the migration - -- Rules subsystem (`RulesLoader` + chat "active rules" UI + QuickRefactor rules block) -- `ChatConfigurationController`, `AgentRoleController` (chat config/role presets) -- `m_promptProvider` (`PromptProviderFim`) in the plugin -- `RequestType::CodeCompletion` branch in all 14 root providers -- Sampling / prompt-generation fields in `CodeCompletionSettings` -- ChatView no longer links `PluginLLMCore` +`ContextManager` is now Qt-Creator-free: it delegates open-file enumeration and +ignore filtering to an injected `IProjectScanner` (defaulting to the QtC adapter), +and keeps only filesystem reads + formatting. `ContextManager::shouldIgnore(path)` +replaced the previously exposed `ignoreManager()`. --- -## 7. Dependency summary +## 7. Cross-cutting -``` - ┌──────────────── Stack B (active runtime) ────────────────┐ -LLMClientInterface ─┐ │ -ClientInterface ────┼─► SessionManager ─► Session ─► Agent ─► GenericProvider ─► LLMQore::*Client -QuickRefactorHandler─┘ │ │ │ │ -ChatCompressor ──────────────┘ │ AgentFactory ProviderFactory - AgentRouter (rosters) │ │ - ProviderInstanceFactory (TOML) - └──────────────────────────────────────────────────────────┘ - - Stack A (settings UI + ConfigurationManager + MCP loop) — isolated, - no runtime consumers remain. -``` +- **Request lifecycle** — a session has at most one in-flight request; `send()` + while in flight cancels the previous. Every request ends in exactly one of + `finished` / `failed` / `cancelled`. Cancellation is not an error; no consumer + string-matches a message to tell them apart. +- **Typed errors** — `ErrorInfo { category ∈ {Config, Auth, Network, Provider, + Validation, Tool}, message, providerDetail }`. `ResponseRouter` categorizes wire + errors (best-effort) at the boundary; `Session::failed` carries the typed value. +- **Tools** — `SessionManager` owns a `ToolContributorRegistry`; built-in ToolKit, + the skill tool, and MCP client tools register once and are contributed to chat + and quick-refactor session clients uniformly. +- **Threading** — the core runs on the GUI thread; concurrency is the Qt event + loop plus async network I/O. Blocking work hides behind L3 ports. --- -## 8. Open follow-ups (optional) +## 8. Tests -1. **Chat picker filtering** — show only `chatAssistant`-roster agents (currently - lists all non-hidden agents; the auto-default may land on a FIM agent). - Requires wiring ChatView to `PipelinesConfig` (watch for OBJECT-library - symbol duplication). -2. **MCP tools on agent clients** — MCP skill tools are registered only on Stack A - providers; to expose MCP tools to chat agents, register them on the session - client alongside `registerQodeAssistTools`. -3. **Physical Stack A teardown** — remove the provider/model/template settings UI, - `ConfigurationManager`, root `providers/*`, `pluginllmcore/*`, and the - registration + MCP loop in `qodeassist.cpp`. Runtime no longer depends on them. -4. **Per-message session cost** — chat/refactor create a fresh agent/provider/client - (and read secrets) per request; a session pool could reduce latency. +`test/` (GTest + Qt::Test) covers the two engines most affected by the rework: + +- `JsonPromptTemplateTest` — the `[body]` engine: jinja render + JSON splice, + literal passthrough, empty-render key drop, nested literals, and load-time + rejection of bodies that render invalid JSON. +- `ResponseRouterTest` — a fake `BaseClient` replays a recorded provider stream; + asserts the assistant message is stamped with the request id, history is built + correctly (thinking + text + tool use/result), the typed event stream is emitted, + and wire errors are categorized. + +--- + +## 9. Remaining follow-ups (optional) + +1. **Qt-Creator-free core build + CI** — `AgentFactory` / `ContextRenderer` still + call `Core::ICore::userResourcePath`, so the core targets link `QtCreator::Core`. + A `ResourcePaths` port + adapter would let the core build without Qt Creator and + enable a CI job that fails on a layering-violating include, plus golden + rendered-body snapshots over the bundled agents loaded through the real loader. +2. **§9 target module layout** — the `core/ ide/ features/ hosts/` physical target + split in `target-architecture.md` is not yet reflected in the directory layout. ```