18 KiB
QodeAssist — Context Architecture (v1.0)
Status: design proposal, extends target-architecture.md (§7 ContextEngine,
delta #9) and agent-templates-design.md (the ctx.* template contract).
Scope: everything between "facts exist in the IDE / on disk / in the
conversation" and "bytes leave in the request body" — what context each
pipeline needs, who acquires it, where it lands in the prompt. One assembly
runs per send(); tool continuations stay inside LLMQore (§4.3).
1. Taxonomy — the five kinds of context
Every piece of context the model ever sees falls into one of five categories. The categories differ in acquisition mode, volatility, and therefore placement — conflating them is the root cause of today's problems (§3).
| # | Category | What it answers | Examples | Volatility |
|---|---|---|---|---|
| C1 | Identity | who is the assistant | agent system_prompt (persona inline or via read_file()), always-on skills, skills catalog |
per agent change |
| C2 | Environment | where is it working | project name + source root, build dir, language/file info, recent changes | per project / slow |
| C3 | Task | what is asked now | chat message, attachments, images, invoked-skill body, completion prefix/suffix, refactor selection + instruction | every turn |
| C4 | Conversation | what happened so far | history (text, thinking, tool use/results), compression summary | grows every turn |
| C5 | Pulled | what the model asked for | tool results (read file, search, build, diagnostics), MCP tool results | inside the turn |
Two acquisition modes cut across the categories:
- Push — we inject proactively (C1–C3, C4). Push is a per-pipeline policy: completion must push everything (no latency budget for tools); chat should push little and let the model pull.
- Pull — the model requests through tools (C5). Pull needs no assembly policy at all, but its results become C4 and therefore must flow through the same budget and serialization rules as everything else.
One more orthogonal property drives placement: stability. Provider prompt
caches (Claude cache_control) reward byte-stable prefixes. Stable content
belongs early (system), volatile content belongs late (near the last user
message). This single rule decides almost every placement question below.
2. Context inventory per pipeline
What each use case (numbering from target-architecture.md §1) actually
needs, against the taxonomy:
| Context item | Cat | U1 completion | U2 chat | U3 refactor | compression | Source port |
|---|---|---|---|---|---|---|
agent system_prompt (persona) |
C1 | ✓ | ✓ (persona switch = agent switch) | ✓ | ✓ | AgentProfile + ContextRenderer |
| skills catalog + always-on | C1 | — | ✓ | — | — | SkillsEngine |
| project root / build dir | C2 | — | ✓ | — | — | IProjectScanner |
| language + file info | C2 | ✓ | — | ✓ | — | IDocumentReader |
| recent project changes | C2 | optional (setting) | — | optional | — | ChangesManager |
| prefix / suffix (FIM) | C3 | ✓ | — | — | — | IDocumentReader |
| selection + position markers | C3 | — | — | ✓ | — | IDocumentReader |
| user message text | C3 | — | ✓ | ✓ (instruction) | ✓ (directive) | UI |
| attachments / images | C3 | — | ✓ | — | — | chat storage (loader) |
invoked skill body (/cmd) |
C3 | — | ✓ | — | — | SkillsEngine |
| linked files (pinned) | C3/C2 | — | ✓ | — | — | IProjectScanner + fs |
| open-files sync | C3/C2 | — | ✓ | — | — | IProjectScanner |
| history | C4 | — (fresh session) | ✓ | — (fresh) | ✓ (read-only input) | ConversationHistory |
| tool results | C5 | — | ✓ | ✓ (optional) | — | ToolsManager / McpHub |
3. Problems in the current code this design removes
Two assembly paths.— RECLASSIFIED 2026-06-12 as by-design, not a problem: the first request renders fromConversationHistory; tool continuations are LLMQore's replay of that payload plus appended tool results. The replay carries the full filtered history of its base payload, so the feared filter divergence does not materialize in practice (§4.3).- No budget. History is never trimmed, estimated, or compacted; every send ships everything, forever.
- Volatile content in system. Linked-file contents live in the
chat.contextsystem layer; any file edit between turns invalidates the provider prompt cache for the whole request. - Invoked skills evaporate. A
/skillbody is injected into the system layer for one send only — the next turn the model has lost the skill's instructions, although the conversation continues to rely on them. - Silent loss. A failed attachment load drops the block with no trace — neither the model nor the user learns the image is gone.
- Repeated materialization. Every send re-reads and re-base64s every stored image/attachment of the whole history from disk.
- Placement decided ad hoc. Each feature hand-formats markdown and picks
a system layer by habit (
completion.context,refactor,chat.context); there is no shared rule for what goes where, and the project-info block is formatted three different ways.
4. Architecture — Acquire → Assemble → Shape
Three stages with hard ownership boundaries:
flowchart LR
subgraph L3["Acquire — ContextEngine (L3, ports + QtC adapters)"]
EC["EditorContext<br/>prefix/suffix, selection,<br/>language, copyright strip"]
PC["ProjectContext<br/>root, ignore filter,<br/>open files, changes"]
TE["TokenEstimator<br/>calibrated by Usage"]
end
subgraph L4["Features (L4) — decide WHAT"]
F["chat / completion / refactor<br/>set layers, pin providers,<br/>build user blocks"]
end
subgraph L2["Assemble — Session (L2) — decide WHERE & HOW MUCH"]
SPB["SystemPromptBuilder<br/>stable layers only"]
PIN["Pinned providers<br/>re-materialized every dispatch"]
CA["ContextAssembler<br/>history + layers + pinned<br/>+ loader + budget → ctx"]
end
subgraph L1["Shape — JsonPromptTemplate (L1/L2)"]
TPL["[body] jinja over ctx.*"]
end
EC --> F
PC --> F
F --> SPB
F --> PIN
F --> CA
SPB --> CA
PIN --> CA
TE --> CA
CA --> TPL
- Acquire (L3) —
ContextEngineservices behind IDE-agnostic ports read facts from the IDE/fs. No prompt text, no placement decisions. One sharedEnvBlockFormatterrenders the project/file info block so it is identical in every pipeline. - Features (L4) decide what context a turn needs: they set their system layer, pin refreshable providers, and compose user blocks. They never decide request shape and never concatenate history.
- Assemble (L2) —
ContextAssembler(successor ofSession::toLegacyContext) is the only producer of the template context, once persend()dispatch; tool continuations replay that payload inside LLMQore (§4.3). It owns placement policy, budget enforcement, materialization, and the manifest. - Shape (L1) — the agent's
[body]table rendersctx.*into the wire request. Templates own shape per provider, never content.
4.1 The three injection mechanisms
| Mechanism | For | Lifetime | Refresh | Persisted |
|---|---|---|---|---|
System layers (SystemPromptBuilder) |
stable C1/C2: agent.system, env.project, skills.catalog, refactor, compression |
conversation | on send | no |
| Pinned providers (new) | refreshable C3/C2: linked files, open-files sync | until unpinned | every send() |
as reference only |
User blocks (send(blocks)) |
one-shot C3: message, attachments, images, invoked-skill body, completion content | that turn | never (history is immutable) | yes |
Pinned providers are the new piece:
session->pinContext(id, [](){ return materialized blocks; });
session->unpinContext(id);
The assembler calls every pinned provider at every send() and splices
the result as text blocks
prepended to the turn's typed user message — the last user-role wire
message that does not carry tool results (falling back to the tool-result
carrier, after its leading tool_result blocks, and to a synthetic user
message when the history has no user message at all). Prepending into an
existing message rather than inserting a separate one keeps strict
user/assistant alternation, which some provider APIs enforce.
The fixed anchor and the per-turn refresh split the cache cost fairly:
within a turn's tool loop the pinned blocks are byte-identical (continuations
replay the payload — pure appends, cache hits); the next send() re-reads
the files, and a change invalidates the cache only from the turn's anchor,
not from the system prefix. The materialized block's label states its capture
time ("content as of this turn") because a tool may mutate the file mid-loop;
the model sees such changes through the tool results themselves. Pinned
content is never stored in history and never persisted — never duplicated
turn-over-turn.
Invoked-skill bodies move the opposite way: out of the system layer into the user blocks of that turn (a dedicated block type), so they persist in history and survive the rest of the conversation (fixes problem 4).
4.2 Placement policy (single table, owned by the assembler)
| Content | Position in request | Why |
|---|---|---|
agent.system (rendered TOML system_prompt) |
system, first | static per agent → max cache reuse |
env.project, skills.catalog |
system, after agent | changes rarely |
pipeline layers (refactor, compression, completion.context) |
system, last | fresh session each time, ordering irrelevant |
| history | messages | as is |
| pinned materializations | text blocks prepended to the turn's typed user message, live content | fixed anchor keeps the prefix cache-stable; content refreshes because tools mutate files at any moment |
| task blocks | last user message | the turn itself |
ClaudeCacheControl breakpoints stay as they are (system / history tail);
this ordering is what makes them effective.
4.3 Tool continuations stay in LLMQore (replay)
The tool loop deliberately stays in LLMQore — the library is a complete,
standalone agentic client, and the loop (execute tools, count rounds,
schedule the next request, stream) is mechanism, which per
target-architecture.md design principle 3 belongs in C++ identically for
all providers. Continuation content is the library's default replay: the
base payload plus the assistant message and appended tool results.
An inversion hook (setContinuationPayloadBuilder, an optional per-request
callback letting Session re-assemble each continuation through
ContextAssembler) was implemented and reverted 2026-06-12: the problem
it solved was judged contrived. The replay already carries the full filtered
history of its base payload, mid-loop file changes reach the model through
the tool results themselves, and continuation growth within one turn is
bounded by maxToolContinuations — budget enforcement at send() time
covers the realistic cases. Consequences accepted with the revert: the
manifest logs one entry per send() (not per wire request), and pinned
content is byte-stable for the duration of a turn's tool loop (§4.1).
2026-06-13 the loop's shape inside LLMQore was refactored without changing
this decision (see tool-loop-runner-plan.md): the loop policy now lives in
ToolLoopRunner (per-request round state, limit, continuation decision) and
BaseClient slimmed to transport + tool dispatch with public primitives
continueRequest / buildReplayContinuation / abortRequest. Continuation
content is still the replay. QodeAssist sets the round limit via
client->toolLoop()->setMaxRounds(...); the old setMaxToolContinuations
stays as a forwarder for compatibility.
4.4 Budget
ContextAssembler consults a BudgetPolicy before producing the context:
input_estimate = TokenEstimator(system + history + pinned + task)
limit = agent context_window − body.max_tokens (output reserve)
context_window comes from provider/model metadata with an optional agent
TOML override. When the estimate exceeds the limit the policy returns a trim
plan executed in deterministic order:
- elide bodies of tool results older than the last N rounds
(
[tool result elided — N tokens]placeholder, pairing preserved); - elide materializations of old stored images/attachments (placeholder block, reference kept in history);
- below a hard floor — refuse with
ErrorCategory::Validationand surface "compress the conversation" (ChatCompressor) in the UI.
v1.0 ships stages: estimate + manifest + UI warning first (no silent trimming), then stage 1–2 elision, then auto-compression hooks. The architecture fixes the seam; the policy can stay minimal.
TokenEstimator is calibrated per provider/model from Usage events
(§8.5 of the target architecture) — chars-per-token ratio updated after every
response; the chat token counter and the budget share this one estimator.
4.5 Materialization and caching
Stored content (attachments, images) stays reference-only in history;
materialization happens in the assembler through the ContentLoader. Two
fixes over today:
- the loader result is cached per
(storedPath, mtime, size)— no re-reading the whole conversation's binaries on every send, and byte-identical turns keep the provider prompt cache warm; - a failed load produces an explicit placeholder block
(
[attachment unavailable: name.png]) instead of silently vanishing — the model can say so, the manifest records it (fixes problem 5).
4.6 Observability: the context manifest
Every assemble() emits one debug-category log entry and a struct on the
event stream:
manifest {
layers: { agent.system: ~1.9k tok, env.project: ~70, skills.catalog: ~640 }
history: 26 messages, ~14.2k tok (3 tool rounds)
pinned: { linked:src/main.cpp: ~2.1k }
task: ~310 tok, 1 image (cached)
elided: [ tool_result a4f1 (~8k) ]
estimate: ~19.3k / limit 32k
}
Nothing is dropped silently — every filter (unsigned thinking, orphaned tool pairs, failed loads, budget elisions) leaves a manifest record. The token counter UI reads the same struct.
5. Wire contract — ctx.* stays, gains one producer
Templates::ContextData (→ ctx.system_prompt, ctx.history,
ctx.prefix/suffix, ctx.files_metadata) remains the contract between the
core and [body] templates — it is not legacy, it is the template-facing
view of the assembled context. The change is that exactly one function
produces it (ContextAssembler::assemble), for every request, and
toLegacyContext/buildLegacyContext are renamed into it. Existing
serialization rules carry over unchanged: system messages never enter
history, unsigned thinking is dropped, orphaned tool_use/tool_result pairs
are filtered, CompletionContent becomes prefix/suffix.
6. Migration plan
Ordered so every step lands independently and shrinks risk:
- Extract
ContextAssemblerfrombuildLegacyContext(pure, unit-tested against fixture histories) + manifest logging + failed-load placeholder blocks. No behavior change otherwise. — DONE 2026-06-12 (sources/Session/ContextAssembler.{hpp,cpp},test/ContextAssemblerTest.cpp; manifest logged under theqodeassist.contextcategory). - ContentLoader cache keyed by
(path, mtime, size). — DONE 2026-06-12 (StoredContentCacheinChatSerializer, owned per-chat byClientInterface, cleared on chat switch). - Pinned providers: linked files and open-files sync move out of the
chat.contextsystem layer; invoked-skill bodies move into the turn's user blocks.chat.contextshrinks to project info + skills catalog. — DONE 2026-06-12 (Session::pinContext/unpinContext, pinned splice inContextAssembler::assemble;SkillInvocationContentblock persisted viaMessageSerializer, invisible in the chat UI by design; open-files sync is covered becauseChatRootViewmerges open editors into the linked list). - Shared
EnvBlockFormatterin ContextEngine; chat/refactor/completion stop hand-formatting project/file info. — DONE 2026-06-12 (context/EnvBlockFormatter.{hpp,cpp}: pureformatProject/formatFile- the
currentProject()QtC gatherer; chat project block, refactor file header, and completion'sgetLanguageAndFileInfoall route through it).
- the
Continuation payload callback— REVERTED 2026-06-12 (implemented, then judged a solution to a contrived problem; see §4.3). Continuations are LLMQore's default replay;ContextAssemblerruns once persend().- TokenEstimator + BudgetPolicy seam — estimate + warning first, then elision stages.
- ContextEngine port split (delta #9 of the target architecture) —
EditorContext/ProjectContext/TokenEstimatorbehind ports, QtC API only inide/contextadapters.
7. Open questions
Pinned placement— RESOLVED 2026-06-12: text blocks prepended to the last user-role wire message (synthetic user message only when there is none). A separate synthetic message would break strict role alternation on some provider APIs; cache behaviour of the two shapes is identical.Tool-loop relocation cost— RESOLVED 2026-06-12: relocation rejected (LLMQore is deliberately a standalone agentic client). The follow-upsetContinuationPayloadBuilderinversion hook was also implemented and reverted the same day — replay is the accepted behaviour (§4.3).- Budget v1 scope — warn-only vs. enabling tool-result elision immediately. Elision changes what the model sees; needs live validation.
- Completion and open files — should completion gain pinned open-files context (cheap with this design), or stay prefix/suffix-only for latency?