Files
QodeAssist/docs/target-architecture.md
2026-06-28 17:38:08 +02:00

28 KiB
Raw Blame History

QodeAssist — Target Architecture (v1.0)

Status: design baseline, derived from the fixed use-case inventory below. Scope: the complete plugin, designed "from scratch" — what the architecture should be if nothing legacy constrained it. The current code (see architecture.md) already converges on this; §10 lists the remaining deltas.


1. Use-case inventory (requirements baseline)

Every architectural decision below is justified by one of these. Features not on this list (Rules system, legacy provider/model/template pickers, Stack A) are intentionally out of scope.

# Use case What the user gets
U1 Code completion Inline FIM/instruct suggestions via LSP; auto + manual trigger, multiline, smart-context suppression, accept full / word-by-word
U2 Chat assistant 4 placements (sidebar, bottom pane, editor tab, floating window); streaming text + thinking blocks + tool blocks + file-edit blocks (apply/undo); attachments, linked files, @-mentions, open-files sync; token counter; persisted history; one-click summarization; runtime agent picker
U3 Quick refactor Selection + instruction by hotkey; custom-instructions library; separate agent; optional tools; streamed result inserted into the editor
U4 Tools read/create/edit file, search, find, list, build, diagnostics, terminal, todo, load_skill; per-tool enable
U5 Skills discovery from .qodeassist/skills, .claude/skills, ~/.claude/skills; auto-injection, explicit / picker, always-on
U6 MCP server mode (expose plugin tools, HTTP/SSE + stdio bridge) and client hub (consume external tools in chat/refactor)
U7 Providers 13 client_api types over one GenericProvider; secrets store; local-server autostart; model listing
U8 Agents TOML profiles: abstract wire-base + thin concrete via extends, [body] table 1:1 with the wire request (message serialization inlined per base), match rules (completion routing), cache_breakpoints, per-agent model override, per-pipeline agent selection
U9 Personas persona = the agent's system_prompt; shared text lives in plain files pulled in via read_file — bundled defaults under :/roles/…, or any file the user points at under ${PROJECT_DIR} / ${CONFIG_DIR} (your QodeAssist user directory); read_file reads the literal path given (no override/fallback resolution); switching persona = switching agent (no separate Roles subsystem)
U10 Configuration UI settings pages for everything above; per-project settings; updater + status widget

2. Design principles

  1. One stack. Every LLM byte — completion, chat, compression, refactor — flows through the same Session pipeline. No parallel legacy path.
  2. Hexagonal core. The runtime (agents, sessions, providers, templates, prompt rendering) has zero Qt Creator dependencies. The IDE host composes that core; IDE-specific facts enter only through ports (document reading, project scanning, secrets, tool hosting).
  3. Configuration is declarative, code is mechanism. What is sent (request [body], system prompt, endpoint, model) lives in TOML/JSON/Jinja and is user-overridable; how it is sent (streaming, retries, tool loop, event routing) lives in C++ and is identical for all providers.
  4. Agent-driven behavior. The agent's TOML declares what a conversation uses (enable_tools, enable_thinking); features and UI adapt to the agent config instead of switching on provider names or provider-declared capability flags.
  5. Single source of truth for conversation state. ConversationHistory owns the messages; ChatModel and persistence are projections of it, never independent copies.
  6. Per-feature composition roots, no singletons. Each feature constructs and owns its dependencies (new + parent); shared services are passed explicitly (constructor/setter, QML context properties for the chat).
  7. Streaming-first event model. One typed ResponseEvent stream is the only contract between the core and every consumer. Deltas exist for live UI (chat); one-shot pipelines (completion, refactor) ignore them, wait for finished, and read the final assistant message from history.
  8. Fail at load, not mid-conversation. Agent profiles are validated when loaded (partials resolve, assembled body parses as JSON against a synthetic context), so a config error never surfaces as a silent runtime drop.

3. Layered model

flowchart TB
    subgraph HOSTS["Hosts — composition roots"]
        PLUGIN["Qt Creator plugin<br/>qodeassist.cpp"]
    end

    subgraph L5["L5 · Presentation"]
        LSP["LSP bridge<br/>inline suggestions"]
        QMLUI["ChatView QML<br/>4 placements"]
        RW["Refactor widgets"]
        SUI["Settings pages"]
    end

    subgraph L4["L4 · Features"]
        FCOMP["CompletionFeature"]
        FCHAT["ChatFeature"]
        FREF["RefactorFeature"]
    end

    subgraph L3["L3 · Capabilities"]
        CTX["ContextEngine<br/>ports + QtC adapters"]
        TOOLS["ToolKit"]
        SKILLS["SkillsEngine"]
        MCPH["McpHub<br/>client + server"]
    end

    subgraph L2["L2 · Core runtime — IDE-independent"]
        SM["SessionManager"]
        SESS["Session"]
        AGF["AgentFactory + AgentRouter"]
        AG["Agent"]
        PROV["GenericProvider"]
        TPL["JsonPromptTemplate"]
    end

    subgraph L1["L1 · Declarative config"]
        PCONF["providers/*.toml"]
        ACONF["agents/*.toml + partials/*.jinja"]
        ROST["rosters / pipelines"]
        PERS["personas/*.md"]
        SKCONF["skills/*.md"]
        SEC["SecretsStore"]
    end

    subgraph L0["L0 · Wire — LLMQore"]
        CLIENTS["*Client — SSE streaming"]
        TOOLFW["Tool framework"]
        MCPT["MCP transports"]
    end

    PLUGIN --> L4
    PLUGIN --> SUI
    LSP --> FCOMP
    QMLUI --> FCHAT
    RW --> FREF
    FCOMP --> SM
    FCHAT --> SM
    FREF --> SM
    FCOMP --> CTX
    FCHAT --> CTX
    FREF --> CTX
    FCHAT --> SKILLS
    FCHAT --> TOOLS
    FREF --> TOOLS
    TOOLS --> TOOLFW
    MCPH --> MCPT
    SM --> SESS
    SESS --> AG
    AGF --> AG
    AG --> PROV
    AG --> TPL
    AGF --> ACONF
    AGF --> PCONF
    AGF --> SEC
    AGF --> ROST
    TPL --> PERS
    PROV --> CLIENTS
    SKILLS --> SKCONF

Layer contracts

Layer Contains May depend on Must NOT depend on
L0 Wire LLMQore clients (one per wire protocol: Claude, OpenAI Chat, OpenAI Responses, Google, Ollama, Mistral, llama.cpp), tool framework, MCP transports Qt Network anything above
L1 Config ProviderInstance, AgentProfile (+ loader/validator), rosters, personas, skills, secrets port toml++, inja Qt Creator, L2+
L2 Core Agent, AgentFactory, AgentRouter, Provider/GenericProvider, JsonPromptTemplate, Session, SessionManager, ConversationHistory, SystemPromptBuilder, ResponseRouter, ToolContributorRegistry L0, L1 Qt Creator, QML, features
L3 Capabilities ContextEngine (ports + QtC adapters), ToolKit (built-in tools), SkillsEngine, McpHub L0L2, QtC APIs only in adapters features, UI
L4 Features CompletionFeature, ChatFeature (send/stream, compression, token counting, file edits), RefactorFeature L2, L3 each other
L5 Presentation LSP bridge, ChatView QML, refactor widgets, settings pages its feature core internals
Hosts plugin shell everything (composition only)

The hard rule that makes testability free: L0L2 build into targets with no Qt Creator linkage. Tests link L0L2 directly; the plugin adds L3 adapters, L4, L5.


4. Core domain model

Rendered copy: core-class-diagram.svg (regenerate when the diagram below changes).

classDiagram
    direction TB
    class SessionManager {
        +acquire(agentName) Session
        +release(session)
        +toolContributors() ToolContributorRegistry
    }
    class Session {
        +send(blocks)
        +cancel()
        +history() ConversationHistory
        +systemPrompt() SystemPromptBuilder
        +event(ResponseEvent)
        +finished(id, stopReason)
        +failed(id, ErrorInfo)
        +cancelled(id)
    }
    class ConversationHistory {
        +messages() vector~Message~
        +lastAssistantText() string
        +append(Message)
        +reset(vector~Message~)
    }
    class Message {
        +role Role
        +blocks vector~ContentBlock~
    }
    class SystemPromptBuilder {
        +setLayer(id, text, priority)
        +removeLayer(id)
        +compose() string
    }
    class ResponseRouter {
        +attach(BaseClient)
        +event(ResponseEvent)
    }
    class Agent {
        +config() AgentConfig
        +provider() Provider
        +promptTemplate() PromptTemplate
    }
    class AgentFactory {
        +create(name) Agent
        +configByName(name) AgentConfig
        +effectiveModel(name) string
    }
    class AgentRouter {
        +pickAgent(roster, fileCtx) string
    }
    class Provider {
        <<interface>>
        +prepareRequest(request, ctx)
        +sendRequest(json) RequestID
        +cancelRequest(RequestID)
    }
    class GenericProvider {
        -client BaseClient
    }
    class PromptTemplate {
        <<interface>>
        +buildFullRequest(request, ctx)
    }
    class JsonPromptTemplate {
        -bodySpec QJsonObject
        -env InjaEnvironment
    }
    class ToolContributorRegistry {
        +registerContributor(fn)
        +applyTo(ToolsManager)
    }

    SessionManager o-- Session : pools
    SessionManager --> AgentFactory : builds via
    SessionManager --> ToolContributorRegistry
    Session *-- ConversationHistory
    Session *-- SystemPromptBuilder
    Session *-- ResponseRouter
    Session --> Agent
    ConversationHistory o-- Message
    Agent *-- Provider
    Agent *-- PromptTemplate
    AgentFactory ..> Agent : creates
    AgentFactory --> AgentRouter
    GenericProvider --|> Provider
    JsonPromptTemplate --|> PromptTemplate

Responsibilities, one line each:

  • Agent — immutable bundle of what to call: resolved config + provider + compiled prompt template. No request state.
  • Session — one conversation's runtime: owns history, system-prompt layers, pinned context providers, response routing, the in-flight request, and the content of each dispatched request (tool continuations replay it inside LLMQore; see context-architecture.md §4.3). send(blocks) is the only entry point: every pipeline appends a user message and dispatches; there are no per-pipeline send variants. What differs between completion, chat, and refactor is the agent's template and the consumption mode (deltas vs final message), never the Session API.
  • SessionManager — creates/pools sessions per agent; the single place features go to get one. Pooling (not per-message construction) covers the "fresh agent + provider + secrets read per request" latency cost. It reuses only the expensive parts (agent, provider, compiled template, secrets read): acquire hands out a session with cleared history and system-prompt layers, so one-shot pipelines never see a previous exchange.
  • AgentRouter — the agent picker for auto-routed pipelines. Only code completion routes by context: pickAgent(roster.codeCompletion, {file, project}) walks the ordered roster and returns the first agent whose match rules fit. Chat is user-driven (the picker filters to the chatAssistant allow-list; the user chooses); compression and quick refactor each use a single configured agent. No feature-local routing logic beyond these.
  • GenericProvider — one class for all 13 client APIs; varies only by LLMQore client factory + metadata. Request shape belongs to the template, never to the provider.
  • JsonPromptTemplate — compiles the agent's [body] table; renders Jinja-bearing string values, splices raw JSON, drops empty keys; validated at load time.
  • SystemPromptBuilder — ordered named layers (agent.system, chat.context, refactor, compression); features mutate only their own layer.
  • ResponseRouter / ResponseEvent — adapts LLMQore client signals into one typed stream: TextDelta, ThinkingDelta, ToolCallStart/End, ToolResult, Usage, Error, MessageStop.
  • ToolContributorRegistry — contributors (built-in ToolKit, SkillTool, McpHub) register once; SessionManager applies them to every new session's ToolsManager. This is how MCP tools reach chat and refactor (U6) without feature code knowing about MCP.

5. Runtime flows

5.1 Chat (U2) — the richest path

sequenceDiagram
    autonumber
    actor U as User
    participant V as ChatView QML
    participant F as ChatFeature
    participant SM as SessionManager
    participant S as Session
    participant T as JsonPromptTemplate
    participant P as GenericProvider
    participant C as LLMQore Client
    participant R as ResponseRouter

    U->>V: message + attachments
    V->>F: sendMessage(text, files, images)
    F->>SM: acquire(activeAgent)
    SM-->>F: Session (pooled)
    F->>S: systemPrompt().setLayer("chat.context", project + skills + linked files)
    F->>S: send(userBlocks)
    S->>T: buildFullRequest(history, system, ctx)
    T-->>S: request JSON (body is 1:1 with the API)
    S->>P: sendRequest(json)
    P->>C: HTTP POST, SSE stream
    loop streaming
        C-->>R: chunk / thinking / tool_use / usage
        R-->>S: ResponseEvent
        S-->>F: event(ResponseEvent)
        F-->>V: ChatModel projection update
    end
    opt tool call requested
        S->>S: execute tool via ToolsManager
        S->>P: continue with tool_result
    end
    C-->>R: finalized
    R-->>S: MessageStop + Usage
    S-->>F: finished()
    F->>SM: release(session)

State ownership in chat: Session.history() is the truth. ChatModel is a QML projection built from history events (messageAdded, messageUpdated); ChatSerializer/ChatHistoryStore persist history, and restoring a chat seeds a new session's history — never the other way around. File-edit blocks, apply/undo, and the token counter are ChatFeature concerns layered on the event stream.

5.2 Completion (U1)

LSP getCompletionsCycling
  → CompletionFeature
      agent   = AgentRouter.pickAgent(roster.codeCompletion, {file, project})
      session = SessionManager.acquire(agent)
      ctx     = ContextEngine: prefix/suffix + open-files context (policy from
                CodeCompletionSettings — editor policy, not agent config)
      session.send(blocks{completion context})
  on finished → history().lastAssistantText()
      → CodeHandler (output-mode post-processing) → LSP items

No special Session method: the completion context travels as the content of an ordinary user message (a structured block carrying prefix/suffix + file context), and the template context exposes it as ctx.prefix / ctx.suffix. FIM vs instruct is agent config (template + body), not feature code: a FIM agent's body renders prefix/suffix into FIM fields; an instruct agent's body renders the same exchange as a chat-shaped request. The feature is identical for both — and since completion has no incremental UI, it never touches the delta stream: it waits for finished and reads the last message.

5.3 Quick refactor (U3)

Hotkey → RefactorFeature
  agent   = pipelines.quickRefactor (single configured agent)
  session = SessionManager.acquire(agent)
  session.systemPrompt().setLayer("refactor", tagged selection + output rules)
  session.send(blocks{instruction})
  on finished → history().lastAssistantText()
      → ResponseCleaner → RefactorResult → editor insert (accept/reject)

Same consumption mode as completion: the feature listens to Session::finished/failed only (events at most drive a progress spinner and cancel) and reads the result from history — it never connects to raw client signals. Tool calls during refactor run inside the session's tool loop; history's last assistant message is whatever the model produced after the final tool round.

5.4 Compression (U2)

Compression is ChatFeature reusing the same path with the single pipelines.chatCompression agent and a "compression" system layer; the summary starts a new history.


6. Configuration model

erDiagram
    AGENT_PROFILE ||--o| AGENT_PROFILE : extends
    AGENT_PROFILE }o--|| PROVIDER_INSTANCE : provider_instance
    AGENT_PROFILE }o--o{ PARTIAL : includes
    AGENT_PROFILE }o--o{ PERSONA : read_file
    ROSTER }o--o{ AGENT_PROFILE : ranks
    MODEL_OVERRIDE |o--|| AGENT_PROFILE : overrides_model
    PROVIDER_INSTANCE }o--|| CLIENT_API : client_api
    PROVIDER_INSTANCE }o--o| SECRET : api_key_ref
    PROVIDER_INSTANCE ||--o| LAUNCH_CONFIG : autostarts

    AGENT_PROFILE {
        string name
        bool abstract
        string system_prompt "jinja; inline text or read_file()"
        json body "request body, 1:1 with API"
        string endpoint "may contain MODEL placeholder"
        string model "default; override wins"
        bool enable_tools "capability hint"
        bool enable_thinking "capability hint"
        json match "file, path, project patterns"
    }
    PROVIDER_INSTANCE {
        string name
        string client_api
        string url
        string api_key_ref
    }
    PERSONA {
        string path "plain markdown file"
    }
    ROSTER {
        string pipeline "completion, chat, compression, refactor"
        list agents "ordered candidates"
    }

Rules of the config layer (full spec: agent-templates-design.md):

  • [body] is the request body — field-by-field, deep-mergeable through extends; Jinja-bearing strings render and splice as raw JSON, literals pass through. No separate sampling/thinking merge machinery.
  • Message serialization is inlined in each abstract wire base; there are no bundled partials. {% include %} still resolves sandboxed roots (bundled :/agents/, then the user agent's dir) for user-supplied partials; a missing partial is a load-time error.
  • Two-level hierarchy: an abstract wire base per provider (provider + endpoint + serialization only — no model/persona/tags/sampling) and a thin concrete agent carrying all policy.
  • Per-agent model override lives in agent_models.json and is applied by AgentFactory; ${MODEL} in endpoint covers URL-model providers.
  • Personas are not a subsystem: the profile's system_prompt is the persona. Shared text lives in plain markdown under the sandboxed roots and is pulled in with {{ read_file(...) }}; a persona-switch is an agent-switch — the only system-prompt edit point is the profile.
  • Secrets never appear in TOML; api_key_ref resolves through the SecretsStore port (QtC keychain in the plugin).

7. Capabilities layer

ContextEngine replaces the monolithic ContextManager with three focused services behind IDE-agnostic ports:

Service Port (L2-visible) QtC adapter
EditorContext — current doc, selection, prefix/suffix IDocumentReader TextEditor API
ProjectContext — root, file listing, ignore filtering (.qodeassistignore), open files, changes IProjectScanner ProjectExplorer API
TokenEstimator — input estimates, calibrated by server usage — (pure)

ToolKit registers the built-in tools (U4) with the ToolContributorRegistry; each tool declares a permission class (read / write / execute) so per-tool enablement (settings) and confirmation policy (terminal commands) live in one place.

SkillsEngine (U5): discovery + watching of the three skill roots; exposes catalogText() (names + descriptions for the system prompt), alwaysOnBodies(), and the load_skill tool; the / picker injects a skill's body into a single message.

McpHub (U6): client side connects configured servers and contributes their tools through the same registry (tools reach every session uniformly); server side exposes ToolKit over HTTP/SSE + stdio bridge.


8. Cross-cutting policies

Architecture is the rules as much as the boxes. These policies bind every layer and are part of the contract:

8.1 Threading

The core runs on the GUI thread; concurrency is the Qt event loop plus async network I/O — no shared-state threading anywhere in L1L4. Work that can block (project scans, token estimation over large trees) hides behind L3 ports; an adapter may use worker threads internally but delivers results as queued signals. Core types are therefore deliberately not thread-safe.

8.2 Request lifecycle

A session has at most one in-flight request; send() while in flight cancels the previous request first. Every request terminates in exactly one of three states — finished(stopReason), failed(error), cancelled() — and cancellation is not an error: no consumer may string-match a message to tell them apart.

8.3 Errors

Runtime errors are typed, not strings: ErrorInfo { category, message, providerDetail } with categories Config | Auth | Network | Provider | Validation | Tool. The category drives UI affordances (Auth → open provider settings, Network → offer retry); free text is for logs only. Load-time errors (principle 8) surface in the agents settings page, never as a failed send.

8.4 Timeouts and retries

Transfer timeouts are per-pipeline policy (completion short, chat/refactor from settings), applied by the feature — never baked into agent profiles. A streaming request is never silently retried after the first byte; automatic retry with capped backoff is allowed only for connection-phase failures. Anything beyond that is an explicit user action.

8.5 Observability

One RequestID correlates feature → session → provider → client → events → logs. Each layer logs under its own category (qodeassist.session, qodeassist.provider, qodeassist.tools, …); request bodies are logged only at debug level, and secrets are redacted unconditionally. Usage events are the single source feeding the token counter, TokenEstimator calibration, and the performance log.

8.6 Config compatibility

Agent profiles carry a schema_version; the loader migrates old user configs forward or rejects them with an actionable message — silent reinterpretation is forbidden. Bundled profiles are read-only resources that user profiles shadow by name. Persisted chat history is versioned the same way.

8.7 Security

Secrets exist only behind the SecretsStore port; they never reach TOML, logs, or persisted chats. Tool permission classes (read / write / execute) centralize the confirmation policy. The MCP server is opt-in and binds loopback by default; skill and partial roots are sandboxed — nothing resolves outside its declared directory.

8.8 Testing

The test pyramid follows the layers:

Layer Strategy
L1 loader/validator unit tests; golden-file snapshots of every bundled profile's rendered body against a synthetic context — the same check as load-time validation, run in CI
L2 Session / ResponseRouter replay tests over recorded SSE fixtures per provider; fake BaseClient, no network
L3 contract tests against the ports; QtC adapters covered only by plugin integration

Layering is enforced mechanically, not by review: each layer is its own CMake target, and the core targets do not link Qt Creator — a violating include fails the build.


9. Module / target layout

core/                       # no Qt Creator linkage — tests link this
  config/                   # L1: ProviderInstance, AgentProfile, loaders,
                            #     validators, rosters, personas, secrets port
  providers/                # L2: Provider, GenericProvider, ProviderFactory,
                            #     ClaudeCacheControl
  prompt/                   # L2: JsonPromptTemplate, ContextRenderer, partials
  agents/                   # L2: Agent, AgentFactory, AgentRouter
  session/                  # L2: Session, SessionManager, ConversationHistory,
                            #     SystemPromptBuilder, ResponseRouter, events
  skills/                   # L3 (IDE-free part): SkillsEngine, loaders
ide/                        # Qt Creator adapters only
  context/                  # EditorContext, ProjectContext adapters, ignore
  tools/                    # built-in ToolKit (build, issues, editor edits…)
  mcp/                      # McpHub managers
features/
  completion/               # LSP bridge + CompletionFeature + CodeHandler
  chat/                     # ChatFeature: ClientInterface, ChatModel(projection),
                            #   Compressor, TokenCounter, FileEditController,
                            #   serializer/store
  refactor/                 # RefactorFeature + custom instructions
ui/
  ChatView qml/, widgets/, settings pages
hosts/
  plugin/                   # qodeassist.cpp — composition root, actions, panes
tests/
  config/                   # loader cases + golden rendered-body snapshots
  session/                  # SSE replay fixtures per provider, fake client
external/
  llmqore/ inja/ tomlplusplus/

Dependency direction is strictly downward in the table of §3; features/* never include each other; ui/* talks only to its feature; hosts/* are the only places allowed to know about everything.


10. Deltas from the current working tree

What "from scratch" changes relative to today's code — the migration checklist to call the architecture done:

  1. Stack A physical teardown — delete root providers/*, pluginllmcore/*, ConfigurationManager, legacy provider/model/template settings pages, and the Stack A registration + MCP loop in qodeassist.cpp. Runtime already has no consumers.
  2. Single history owner — make ChatModel a projection of Session::history() (subscribe to history signals) instead of a parallel message store with seed-on-send; ChatCompressor reads history, not the model.
  3. Single send path — delete Session::sendCompletion(ContextData); the completion context becomes user-message content sent through the one send() (the completion handler already reads its result from history's last message). Move QuickRefactorHandler off raw BaseClient signals (requestCompleted/requestFinalized/requestFailed) onto Session::finished/failed + history().lastAssistantText().
  4. Three-state request lifecycle — add cancelled to Session; today cancel() emits failed(id, "Cancelled by user") and consumers must string-match to tell cancellation from failure (§8.2).
  5. Typed errors — replace lastError strings and the failed(QString) payload with ErrorInfo categories (§8.3).
  6. Agent selection by pipeline shape — completion is the only context-routed pipeline (AgentRouter.pickAgent(roster.codeCompletion, {file, project})); chat picker filters to the chatAssistant allow-list; quick refactor and compression each read a single configured agent (no routing).
  7. MCP tools on session clients — register MCP-contributed tools through ToolContributorRegistry so chat/refactor sessions get them (today they are registered only on dead Stack A providers).
  8. Session poolingSessionManager.acquire/release with a small pool per agent, replacing per-message agent + provider + secrets construction.
  9. ContextManager split — extract EditorContext / ProjectContext / TokenEstimator behind ports; move QtC API use into ide/context.
  10. [body] model completion — finish agent-templates-design.md (body-table rendering, sandboxed include, load-time validation, model override + ${MODEL}, schema_version gate), delete sampling/thinking merge machinery.
  11. Message type unification — one Message/ContentBlock shape from history to QML (roles, text, thinking, tool use/result, images); delete the parallel ChatModel::Message struct.
  12. Test scaffolding — golden rendered-body snapshots + SSE replay fixtures (§8.8); CI builds the core targets without Qt Creator so a layering violation fails the build.
  13. Stale docs cleanupproject-rules.md describes the removed Rules system; mark or delete.