12 KiB
QodeAssist Architecture
This document describes the runtime architecture of QodeAssist after the
migration of all LLM runtime paths onto the agent / Session stack
("Stack B"). Every runtime LLM path — code completion, chat (send/stream +
compression + token counting), and quick refactor — now goes through agents,
Session, and the Providers::GenericProvider layer.
Legend: ✅ = on Stack B (active runtime), 🔴 = legacy Stack A (isolated, no runtime consumers left).
1. Top level: ownership and dependency injection
The plugin (qodeassist.cpp) owns everything via new + parent (no plugin-wide
singletons; each feature receives its dependencies explicitly).
QodeAssistPlugin
Stack B infrastructure:
• Providers::registerBuiltinProviders() — registers 13 client_api types
• ProviderInstanceFactory — 14 instances from TOML
• ProviderSecretsStore
• AgentFactory — agents from TOML
• SessionManager(agentFactory)
• m_engine (QQmlEngine)
rootContext: "agentFactory", "sessionManager" — DI for chat (QML)
Wired into consumers:
• QodeAssistClient ← LLMClientInterface(*sessionManager, *agentFactory)
← setSessionManager / setAgentFactory (for quick refactor)
Chat lives in QML (ChatRootView is a QML_ELEMENT), so AgentFactory and
SessionManager are exposed as context properties on the engine's root
context and resolved in ChatRootView via
qmlEngine(this)->rootContext()->contextProperty(...).
2. Stack B core (agent / Session)
AgentFactory.create(name)
configByName(name) → AgentConfig (TOML)
providerInstance, model, endpoint, role, messageFormat,
sampling, enableTools, enableThinking, match{filePatterns,...}
buildProviderForAgent:
instance = ProviderInstanceFactory.instanceByName(cfg.providerInstance)
provider = ProviderFactory::create(instance.clientApi) ◄── keystone
provider.setUrl(instance.url)
provider.setApiKey(secrets.read(instance.apiKeyRef))
▼
Agent(config, provider)
promptTemplate = JsonPromptTemplate::fromConfig(cfg.messageFormat) (inja)
▼
SessionManager.createSession(agentName) → Session(agent)
├─ ConversationHistory — messages as ContentBlocks
├─ SystemPromptBuilder — layers: agent.role + caller layers
└─ ResponseRouter(client) — emits ResponseEvent
Session API:
• send(blocks, toolsOverride) — chat/refactor: append user msg + dispatch
• sendCompletion(ContextData) — completion: FIM prefix/suffix
• client() — agent's LLMQore::BaseClient (direct streaming)
• systemPrompt()->setLayer(...) — dynamic context layers
• supportsImages() — provider Image capability
• history() — for seeding from ChatModel
Session::sendCompletion and dispatch compose SystemPromptBuilder layers
(agent.role + caller-provided) into the request system prompt.
3. Provider layer — the keystone (implemented during migration)
The Stack B provider layer previously existed only as an abstract base +
empty factory (registerType was never called, no concrete providers). This
blocked every agent from obtaining a working provider. It is now implemented
via a single configuration-driven GenericProvider.
ProviderFactory (sources/providers, namespace functions)
registerType(name, fn) / create(name, parent) / knownNames()
▲
│ registerBuiltinProviders() — client_api → provider table
│
GenericProvider : Providers::Provider
• owns an LLMQore::BaseClient (created by a ClientFactory)
• prepareRequest — inherited from Provider base:
delegates to PromptTemplate::buildFullRequest
• client() / providerID() / capabilities() / getInstalledModels()
client_api → provider table
| client_api | LLMQore client | ProviderID | capabilities |
|---|---|---|---|
| Claude | ClaudeClient | Claude | Tools·Thinking·Image·ModelListing |
| Google AI | GoogleAIClient | GoogleAI | Tools·Thinking·Image·ModelListing |
| llama.cpp | LlamaCppClient | LlamaCpp | Tools·Thinking·Image·ModelListing |
| Mistral AI | MistralClient | MistralAI | Tools·Thinking·Image·ModelListing |
| Codestral | MistralClient | MistralAI | Tools·Image |
| Ollama (Native) | OllamaClient | Ollama | Tools·Thinking·Image·ModelListing |
| Ollama (OpenAI-compatible) | OpenAIClient | OpenAICompatible | Tools·Thinking·Image·ModelListing |
| OpenAI (Chat Completions) | OpenAIClient | OpenAI | Tools·Thinking·Image·ModelListing |
| OpenAI (Responses API) | OpenAIResponsesClient | OpenAIResponses | Tools·Thinking·Image·ModelListing |
| OpenAI Compatible | OpenAIClient | OpenAICompatible | Tools·Image·Thinking |
| OpenRouter | OpenAIClient | OpenRouter | Tools·Image·Thinking·ModelListing |
| LM Studio (Chat Completions) | OpenAIClient | LMStudio | Tools·Thinking·Image·ModelListing |
| LM Studio (Responses API) | OpenAIResponsesClient | OpenAIResponses | Tools·Thinking·Image·ModelListing |
Request shape comes from the agent's prompt template (jinja messageFormat),
so a single provider class covers every API by varying only the client factory
and metadata.
4. Runtime paths (all on Stack B)
4a. Code completion ✅
Qt Creator LSP (getCompletionsCycling)
▼
LLMClientInterface
pickCompletionAgent: AgentRouter.pickAgent(roster.codeCompletion, {file, project})
session = sessionManager.createSession(agent)
ctx = Templates::ContextData{ prefix, suffix,
systemPrompt = fileContext + openFiles }
session.sendCompletion(ctx)
▼ stream from session.client():
requestCompleted → sendCompletionToClient → CodeHandler → LSP
system prompt = agent.role; FIM template renders prefix/suffix
4b. Chat ✅
ChatRootView (QML)
resolve agentFactory()/sessionManager() = qmlEngine(this)->rootContext()
ChatAgentController: agent list (configNames), active agent (persisted),
supportsThinking/Tools
QML agent picker (TopBar.agentSelector) — replaced provider/model/template combos
▼ dispatchSend
ClientInterface
session = sessionManager.createSession(currentChatAgent)
registerQodeAssistTools(session.client().tools()) + registerSkillTool
systemPrompt layer "chat.context" = project info + skills + linked files
seedHistory(session.history() ← ChatModel: user/assistant/tool-call+result)
session.send(userBlocks{text + images}, useTools)
▼ stream from session.client() → existing handlers → ChatModel:
chunk→addMessage thinking→addThinkingBlock
tool→addToolExecutionStatus / updateToolResult
finalized→usage completed→messageReceivedCompletely → removeSession
ChatCompressor → createSession(agent) → seed history → layer "compression" → send(prompt)
InputTokenCounter → estimate without provider (calibrated by server usage)
4c. Quick refactor ✅
QodeAssistClient.requestQuickRefactor → QuickRefactorHandler (setSessionManager/setAgentFactory)
pickRefactorAgent: AgentRouter.pickAgent(roster.quickRefactor, {file, project})
session = createSession(agent)
if useTools: registerQodeAssistTools(session.client().tools())
systemPrompt layer "refactor" = buildSystemPrompt(tagged content +
output requirements + indentation rules)
session.send(blocks{instructions}, useTools)
▼ stream from session.client():
requestCompleted → ResponseCleaner → RefactorResult → insert into editor
5. Configuration sources
~/.config/.../qodeassist/config/
providers/*.toml → ProviderInstance { name, client_api, url, api_key_ref }
agents/*.toml → AgentConfig { providerInstance, model, endpoint, role,
messageFormat, sampling, match, enable* }
pipelines rosters → codeCompletion / chatAssistant / chatCompression / quickRefactor
consumed by AgentRouter.pickAgent(roster, {filePath, projectName})
Editor policy (NOT agent config):
CodeCompletionSettings — triggers, modelOutputHandler, context extraction,
useOpenFilesContext
(sampling / prompt-generation fields removed)
6. Remaining Stack A (runtime does NOT depend on it)
🔴 Settings UI: provider/model/template selection pages
(ccProvider / caProvider / qrProvider) + ConfigurationManager
→ use ProvidersManager
🔴 root providers/* (PluginLLMCore::Provider, 14 classes)
→ read only chat/quick-refactor sampling settings
🔴 pluginllmcore/* (ProvidersManager, PromptTemplateManager, ResponseCleaner,
PromptProviderChat/Fim, ContextData)
🔴 qodeassist.cpp:144-146 registerProviders() / registerTemplates() (Stack A registration)
🔴 qodeassist.cpp:185 MCP skill-tool loop on Stack A providers (effectively dead)
🔴 ChatAssistantSettings / QuickRefactorSettings — sampling fields (read only by root providers)
ResponseCleaner (pluginllmcore) is still used by QuickRefactorHandler as a text
utility — orthogonal to the provider stack.
Removed during the migration
- Rules subsystem (
RulesLoader+ chat "active rules" UI + QuickRefactor rules block) ChatConfigurationController,AgentRoleController(chat config/role presets)m_promptProvider(PromptProviderFim) in the pluginRequestType::CodeCompletionbranch in all 14 root providers- Sampling / prompt-generation fields in
CodeCompletionSettings - ChatView no longer links
PluginLLMCore
7. Dependency summary
┌──────────────── Stack B (active runtime) ────────────────┐
LLMClientInterface ─┐ │
ClientInterface ────┼─► SessionManager ─► Session ─► Agent ─► GenericProvider ─► LLMQore::*Client
QuickRefactorHandler─┘ │ │ │ │
ChatCompressor ──────────────┘ │ AgentFactory ProviderFactory
AgentRouter (rosters) │ │
ProviderInstanceFactory (TOML)
└──────────────────────────────────────────────────────────┘
Stack A (settings UI + ConfigurationManager + MCP loop) — isolated,
no runtime consumers remain.
8. Open follow-ups (optional)
- Chat picker filtering — show only
chatAssistant-roster agents (currently lists all non-hidden agents; the auto-default may land on a FIM agent). Requires wiring ChatView toPipelinesConfig(watch for OBJECT-library symbol duplication). - MCP tools on agent clients — MCP skill tools are registered only on Stack A
providers; to expose MCP tools to chat agents, register them on the session
client alongside
registerQodeAssistTools. - Physical Stack A teardown — remove the provider/model/template settings UI,
ConfigurationManager, rootproviders/*,pluginllmcore/*, and the registration + MCP loop inqodeassist.cpp. Runtime no longer depends on them. - Per-message session cost — chat/refactor create a fresh agent/provider/client (and read secrets) per request; a session pool could reduce latency.