Files
QodeAssist/docs/architecture.md
2026-06-09 08:46:45 +02:00

259 lines
12 KiB
Markdown

# QodeAssist Architecture
This document describes the runtime architecture of QodeAssist after the
migration of all LLM runtime paths onto the agent / `Session` stack
("Stack B"). Every runtime LLM path — code completion, chat (send/stream +
compression + token counting), and quick refactor — now goes through agents,
`Session`, and the `Providers::GenericProvider` layer.
> Legend: ✅ = on Stack B (active runtime), 🔴 = legacy Stack A (isolated, no
> runtime consumers left).
---
## 1. Top level: ownership and dependency injection
The plugin (`qodeassist.cpp`) owns everything via `new` + parent (no plugin-wide
singletons; each feature receives its dependencies explicitly).
```
QodeAssistPlugin
Stack B infrastructure:
• Providers::registerBuiltinProviders() — registers 13 client_api types
• ProviderInstanceFactory — 14 instances from TOML
• ProviderSecretsStore
• AgentFactory — agents from TOML
• SessionManager(agentFactory)
• m_engine (QQmlEngine)
rootContext: "agentFactory", "sessionManager" — DI for chat (QML)
Wired into consumers:
• QodeAssistClient ← LLMClientInterface(*sessionManager, *agentFactory)
← setSessionManager / setAgentFactory (for quick refactor)
```
Chat lives in QML (`ChatRootView` is a `QML_ELEMENT`), so `AgentFactory` and
`SessionManager` are exposed as **context properties on the engine's root
context** and resolved in `ChatRootView` via
`qmlEngine(this)->rootContext()->contextProperty(...)`.
---
## 2. Stack B core (agent / Session)
```
AgentFactory.create(name)
configByName(name) → AgentConfig (TOML)
providerInstance, model, endpoint, role, messageFormat,
sampling, enableTools, enableThinking, match{filePatterns,...}
buildProviderForAgent:
instance = ProviderInstanceFactory.instanceByName(cfg.providerInstance)
provider = ProviderFactory::create(instance.clientApi) ◄── keystone
provider.setUrl(instance.url)
provider.setApiKey(secrets.read(instance.apiKeyRef))
Agent(config, provider)
promptTemplate = JsonPromptTemplate::fromConfig(cfg.messageFormat) (inja)
SessionManager.createSession(agentName) → Session(agent)
├─ ConversationHistory — messages as ContentBlocks
├─ SystemPromptBuilder — layers: agent.role + caller layers
└─ ResponseRouter(client) — emits ResponseEvent
Session API:
• send(blocks, toolsOverride) — chat/refactor: append user msg + dispatch
• sendCompletion(ContextData) — completion: FIM prefix/suffix
• client() — agent's LLMQore::BaseClient (direct streaming)
• systemPrompt()->setLayer(...) — dynamic context layers
• supportsImages() — provider Image capability
• history() — for seeding from ChatModel
```
`Session::sendCompletion` and `dispatch` compose `SystemPromptBuilder` layers
(`agent.role` + caller-provided) into the request system prompt.
---
## 3. Provider layer — the keystone (implemented during migration)
The Stack B provider layer previously existed only as an abstract base +
empty factory (`registerType` was never called, no concrete providers). This
blocked every agent from obtaining a working provider. It is now implemented
via a single configuration-driven `GenericProvider`.
```
ProviderFactory (sources/providers, namespace functions)
registerType(name, fn) / create(name, parent) / knownNames()
│ registerBuiltinProviders() — client_api → provider table
GenericProvider : Providers::Provider
• owns an LLMQore::BaseClient (created by a ClientFactory)
• prepareRequest — inherited from Provider base:
delegates to PromptTemplate::buildFullRequest
• client() / providerID() / capabilities() / getInstalledModels()
```
### client_api → provider table
| client_api | LLMQore client | ProviderID | capabilities |
|--------------------------------|-------------------------|------------------|-------------------------|
| Claude | ClaudeClient | Claude | Tools·Thinking·Image·ModelListing |
| Google AI | GoogleAIClient | GoogleAI | Tools·Thinking·Image·ModelListing |
| llama.cpp | LlamaCppClient | LlamaCpp | Tools·Thinking·Image·ModelListing |
| Mistral AI | MistralClient | MistralAI | Tools·Thinking·Image·ModelListing |
| Codestral | MistralClient | MistralAI | Tools·Image |
| Ollama (Native) | OllamaClient | Ollama | Tools·Thinking·Image·ModelListing |
| Ollama (OpenAI-compatible) | OpenAIClient | OpenAICompatible | Tools·Thinking·Image·ModelListing |
| OpenAI (Chat Completions) | OpenAIClient | OpenAI | Tools·Thinking·Image·ModelListing |
| OpenAI (Responses API) | OpenAIResponsesClient | OpenAIResponses | Tools·Thinking·Image·ModelListing |
| OpenAI Compatible | OpenAIClient | OpenAICompatible | Tools·Image·Thinking |
| OpenRouter | OpenAIClient | OpenRouter | Tools·Image·Thinking·ModelListing |
| LM Studio (Chat Completions) | OpenAIClient | LMStudio | Tools·Thinking·Image·ModelListing |
| LM Studio (Responses API) | OpenAIResponsesClient | OpenAIResponses | Tools·Thinking·Image·ModelListing |
Request *shape* comes from the agent's prompt template (jinja `messageFormat`),
so a single provider class covers every API by varying only the client factory
and metadata.
---
## 4. Runtime paths (all on Stack B)
### 4a. Code completion ✅
```
Qt Creator LSP (getCompletionsCycling)
LLMClientInterface
pickCompletionAgent: AgentRouter.pickAgent(roster.codeCompletion, {file, project})
session = sessionManager.createSession(agent)
ctx = Templates::ContextData{ prefix, suffix,
systemPrompt = fileContext + openFiles }
session.sendCompletion(ctx)
▼ stream from session.client():
requestCompleted → sendCompletionToClient → CodeHandler → LSP
system prompt = agent.role; FIM template renders prefix/suffix
```
### 4b. Chat ✅
```
ChatRootView (QML)
resolve agentFactory()/sessionManager() = qmlEngine(this)->rootContext()
ChatAgentController: agent list (configNames), active agent (persisted),
supportsThinking/Tools
QML agent picker (TopBar.agentSelector) — replaced provider/model/template combos
▼ dispatchSend
ClientInterface
session = sessionManager.createSession(currentChatAgent)
registerQodeAssistTools(session.client().tools()) + registerSkillTool
systemPrompt layer "chat.context" = project info + skills + linked files
seedHistory(session.history() ← ChatModel: user/assistant/tool-call+result)
session.send(userBlocks{text + images}, useTools)
▼ stream from session.client() → existing handlers → ChatModel:
chunk→addMessage thinking→addThinkingBlock
tool→addToolExecutionStatus / updateToolResult
finalized→usage completed→messageReceivedCompletely → removeSession
ChatCompressor → createSession(agent) → seed history → layer "compression" → send(prompt)
InputTokenCounter → estimate without provider (calibrated by server usage)
```
### 4c. Quick refactor ✅
```
QodeAssistClient.requestQuickRefactor → QuickRefactorHandler (setSessionManager/setAgentFactory)
pickRefactorAgent: AgentRouter.pickAgent(roster.quickRefactor, {file, project})
session = createSession(agent)
if useTools: registerQodeAssistTools(session.client().tools())
systemPrompt layer "refactor" = buildSystemPrompt(tagged content +
output requirements + indentation rules)
session.send(blocks{instructions}, useTools)
▼ stream from session.client():
requestCompleted → ResponseCleaner → RefactorResult → insert into editor
```
---
## 5. Configuration sources
```
~/.config/.../qodeassist/config/
providers/*.toml → ProviderInstance { name, client_api, url, api_key_ref }
agents/*.toml → AgentConfig { providerInstance, model, endpoint, role,
messageFormat, sampling, match, enable* }
pipelines rosters → codeCompletion / chatAssistant / chatCompression / quickRefactor
consumed by AgentRouter.pickAgent(roster, {filePath, projectName})
Editor policy (NOT agent config):
CodeCompletionSettings — triggers, modelOutputHandler, context extraction,
useOpenFilesContext
(sampling / prompt-generation fields removed)
```
---
## 6. Remaining Stack A (runtime does NOT depend on it)
```
🔴 Settings UI: provider/model/template selection pages
(ccProvider / caProvider / qrProvider) + ConfigurationManager
→ use ProvidersManager
🔴 root providers/* (PluginLLMCore::Provider, 14 classes)
→ read only chat/quick-refactor sampling settings
🔴 pluginllmcore/* (ProvidersManager, PromptTemplateManager, ResponseCleaner,
PromptProviderChat/Fim, ContextData)
🔴 qodeassist.cpp:144-146 registerProviders() / registerTemplates() (Stack A registration)
🔴 qodeassist.cpp:185 MCP skill-tool loop on Stack A providers (effectively dead)
🔴 ChatAssistantSettings / QuickRefactorSettings — sampling fields (read only by root providers)
ResponseCleaner (pluginllmcore) is still used by QuickRefactorHandler as a text
utility — orthogonal to the provider stack.
```
### Removed during the migration
- Rules subsystem (`RulesLoader` + chat "active rules" UI + QuickRefactor rules block)
- `ChatConfigurationController`, `AgentRoleController` (chat config/role presets)
- `m_promptProvider` (`PromptProviderFim`) in the plugin
- `RequestType::CodeCompletion` branch in all 14 root providers
- Sampling / prompt-generation fields in `CodeCompletionSettings`
- ChatView no longer links `PluginLLMCore`
---
## 7. Dependency summary
```
┌──────────────── Stack B (active runtime) ────────────────┐
LLMClientInterface ─┐ │
ClientInterface ────┼─► SessionManager ─► Session ─► Agent ─► GenericProvider ─► LLMQore::*Client
QuickRefactorHandler─┘ │ │ │ │
ChatCompressor ──────────────┘ │ AgentFactory ProviderFactory
AgentRouter (rosters) │ │
ProviderInstanceFactory (TOML)
└──────────────────────────────────────────────────────────┘
Stack A (settings UI + ConfigurationManager + MCP loop) — isolated,
no runtime consumers remain.
```
---
## 8. Open follow-ups (optional)
1. **Chat picker filtering** — show only `chatAssistant`-roster agents (currently
lists all non-hidden agents; the auto-default may land on a FIM agent).
Requires wiring ChatView to `PipelinesConfig` (watch for OBJECT-library
symbol duplication).
2. **MCP tools on agent clients** — MCP skill tools are registered only on Stack A
providers; to expose MCP tools to chat agents, register them on the session
client alongside `registerQodeAssistTools`.
3. **Physical Stack A teardown** — remove the provider/model/template settings UI,
`ConfigurationManager`, root `providers/*`, `pluginllmcore/*`, and the
registration + MCP loop in `qodeassist.cpp`. Runtime no longer depends on them.
4. **Per-message session cost** — chat/refactor create a fresh agent/provider/client
(and read secrets) per request; a session pool could reduce latency.
```