mirror of
https://github.com/Palm1r/QodeAssist.git
synced 2026-06-30 18:19:11 -04:00
348 lines
18 KiB
Markdown
348 lines
18 KiB
Markdown
# QodeAssist — Context Architecture (v1.0)
|
||
|
||
Status: design proposal, extends `target-architecture.md` (§7 ContextEngine,
|
||
delta #9) and `agent-templates-design.md` (the `ctx.*` template contract).
|
||
Scope: everything between "facts exist in the IDE / on disk / in the
|
||
conversation" and "bytes leave in the request body" — what context each
|
||
pipeline needs, who acquires it, where it lands in the prompt. One assembly
|
||
runs per `send()`; tool continuations stay inside LLMQore (§4.3).
|
||
|
||
---
|
||
|
||
## 1. Taxonomy — the five kinds of context
|
||
|
||
Every piece of context the model ever sees falls into one of five categories.
|
||
The categories differ in *acquisition mode*, *volatility*, and therefore
|
||
*placement* — conflating them is the root cause of today's problems (§3).
|
||
|
||
| # | Category | What it answers | Examples | Volatility |
|
||
|---|----------|-----------------|----------|------------|
|
||
| C1 | **Identity** | who is the assistant | agent `system_prompt` (persona inline or via `read_file()`), always-on skills, skills catalog | per agent change |
|
||
| C2 | **Environment** | where is it working | project name + source root, build dir, language/file info, recent changes | per project / slow |
|
||
| C3 | **Task** | what is asked *now* | chat message, attachments, images, invoked-skill body, completion prefix/suffix, refactor selection + instruction | every turn |
|
||
| C4 | **Conversation** | what happened so far | history (text, thinking, tool use/results), compression summary | grows every turn |
|
||
| C5 | **Pulled** | what the model asked for | tool results (read file, search, build, diagnostics), MCP tool results | inside the turn |
|
||
|
||
Two acquisition modes cut across the categories:
|
||
|
||
- **Push** — we inject proactively (C1–C3, C4). Push is a *per-pipeline
|
||
policy*: completion must push everything (no latency budget for tools);
|
||
chat should push little and let the model pull.
|
||
- **Pull** — the model requests through tools (C5). Pull needs no assembly
|
||
policy at all, but its *results* become C4 and therefore must flow through
|
||
the same budget and serialization rules as everything else.
|
||
|
||
One more orthogonal property drives placement: **stability**. Provider prompt
|
||
caches (Claude `cache_control`) reward byte-stable prefixes. Stable content
|
||
belongs early (system), volatile content belongs late (near the last user
|
||
message). This single rule decides almost every placement question below.
|
||
|
||
---
|
||
|
||
## 2. Context inventory per pipeline
|
||
|
||
What each use case (numbering from `target-architecture.md` §1) actually
|
||
needs, against the taxonomy:
|
||
|
||
| Context item | Cat | U1 completion | U2 chat | U3 refactor | compression | Source port |
|
||
|---|---|---|---|---|---|---|
|
||
| agent `system_prompt` (persona) | C1 | ✓ | ✓ (persona switch = agent switch) | ✓ | ✓ | AgentProfile + ContextRenderer |
|
||
| skills catalog + always-on | C1 | — | ✓ | — | — | SkillsEngine |
|
||
| project root / build dir | C2 | — | ✓ | — | — | `IProjectScanner` |
|
||
| language + file info | C2 | ✓ | — | ✓ | — | `IDocumentReader` |
|
||
| recent project changes | C2 | optional (setting) | — | optional | — | ChangesManager |
|
||
| prefix / suffix (FIM) | C3 | ✓ | — | — | — | `IDocumentReader` |
|
||
| selection + position markers | C3 | — | — | ✓ | — | `IDocumentReader` |
|
||
| user message text | C3 | — | ✓ | ✓ (instruction) | ✓ (directive) | UI |
|
||
| attachments / images | C3 | — | ✓ | — | — | chat storage (loader) |
|
||
| invoked skill body (`/cmd`) | C3 | — | ✓ | — | — | SkillsEngine |
|
||
| linked files (pinned) | C3/C2 | — | ✓ | — | — | `IProjectScanner` + fs |
|
||
| open-files sync | C3/C2 | — | ✓ | — | — | `IProjectScanner` |
|
||
| history | C4 | — (fresh session) | ✓ | — (fresh) | ✓ (read-only input) | ConversationHistory |
|
||
| tool results | C5 | — | ✓ | ✓ (optional) | — | ToolsManager / McpHub |
|
||
|
||
---
|
||
|
||
## 3. Problems in the current code this design removes
|
||
|
||
1. ~~**Two assembly paths.**~~ — RECLASSIFIED 2026-06-12 as by-design, not a
|
||
problem: the first request renders from `ConversationHistory`; tool
|
||
continuations are LLMQore's replay of that payload plus appended tool
|
||
results. The replay carries the full filtered history of its base payload,
|
||
so the feared filter divergence does not materialize in practice (§4.3).
|
||
2. **No budget.** History is never trimmed, estimated, or compacted; every
|
||
send ships everything, forever.
|
||
3. **Volatile content in system.** Linked-file contents live in the
|
||
`chat.context` system layer; any file edit between turns invalidates the
|
||
provider prompt cache for the whole request.
|
||
4. **Invoked skills evaporate.** A `/skill` body is injected into the system
|
||
layer for one send only — the next turn the model has lost the skill's
|
||
instructions, although the conversation continues to rely on them.
|
||
5. **Silent loss.** A failed attachment load drops the block with no trace —
|
||
neither the model nor the user learns the image is gone.
|
||
6. **Repeated materialization.** Every send re-reads and re-base64s every
|
||
stored image/attachment of the whole history from disk.
|
||
7. **Placement decided ad hoc.** Each feature hand-formats markdown and picks
|
||
a system layer by habit (`completion.context`, `refactor`, `chat.context`);
|
||
there is no shared rule for what goes where, and the project-info block is
|
||
formatted three different ways.
|
||
|
||
---
|
||
|
||
## 4. Architecture — Acquire → Assemble → Shape
|
||
|
||
Three stages with hard ownership boundaries:
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
subgraph L3["Acquire — ContextEngine (L3, ports + QtC adapters)"]
|
||
EC["EditorContext<br/>prefix/suffix, selection,<br/>language, copyright strip"]
|
||
PC["ProjectContext<br/>root, ignore filter,<br/>open files, changes"]
|
||
TE["TokenEstimator<br/>calibrated by Usage"]
|
||
end
|
||
subgraph L4["Features (L4) — decide WHAT"]
|
||
F["chat / completion / refactor<br/>set layers, pin providers,<br/>build user blocks"]
|
||
end
|
||
subgraph L2["Assemble — Session (L2) — decide WHERE & HOW MUCH"]
|
||
SPB["SystemPromptBuilder<br/>stable layers only"]
|
||
PIN["Pinned providers<br/>re-materialized every dispatch"]
|
||
CA["ContextAssembler<br/>history + layers + pinned<br/>+ loader + budget → ctx"]
|
||
end
|
||
subgraph L1["Shape — JsonPromptTemplate (L1/L2)"]
|
||
TPL["[body] jinja over ctx.*"]
|
||
end
|
||
EC --> F
|
||
PC --> F
|
||
F --> SPB
|
||
F --> PIN
|
||
F --> CA
|
||
SPB --> CA
|
||
PIN --> CA
|
||
TE --> CA
|
||
CA --> TPL
|
||
```
|
||
|
||
- **Acquire (L3)** — `ContextEngine` services behind IDE-agnostic ports read
|
||
facts from the IDE/fs. No prompt text, no placement decisions. One shared
|
||
`EnvBlockFormatter` renders the project/file info block so it is identical
|
||
in every pipeline.
|
||
- **Features (L4)** decide *what* context a turn needs: they set their system
|
||
layer, pin refreshable providers, and compose user blocks. They never
|
||
decide request shape and never concatenate history.
|
||
- **Assemble (L2)** — `ContextAssembler` (successor of
|
||
`Session::toLegacyContext`) is the **only** producer of the template
|
||
context, once per `send()` dispatch; tool continuations replay that payload
|
||
inside LLMQore (§4.3). It owns placement policy, budget enforcement,
|
||
materialization, and the manifest.
|
||
- **Shape (L1)** — the agent's `[body]` table renders `ctx.*` into the wire
|
||
request. Templates own *shape per provider*, never content.
|
||
|
||
### 4.1 The three injection mechanisms
|
||
|
||
| Mechanism | For | Lifetime | Refresh | Persisted |
|
||
|---|---|---|---|---|
|
||
| **System layers** (`SystemPromptBuilder`) | stable C1/C2: `agent.system`, `env.project`, `skills.catalog`, `refactor`, `compression` | conversation | on send | no |
|
||
| **Pinned providers** (new) | refreshable C3/C2: linked files, open-files sync | until unpinned | **every `send()`** | as reference only |
|
||
| **User blocks** (`send(blocks)`) | one-shot C3: message, attachments, images, invoked-skill body, completion content | that turn | never (history is immutable) | yes |
|
||
|
||
Pinned providers are the new piece:
|
||
|
||
```
|
||
session->pinContext(id, [](){ return materialized blocks; });
|
||
session->unpinContext(id);
|
||
```
|
||
|
||
The assembler calls every pinned provider at **every `send()`** and splices
|
||
the result as text blocks
|
||
**prepended to the turn's typed user message** — the last user-role wire
|
||
message that does not carry tool results (falling back to the tool-result
|
||
carrier, after its leading `tool_result` blocks, and to a synthetic user
|
||
message when the history has no user message at all). Prepending into an
|
||
existing message rather than inserting a separate one keeps strict
|
||
user/assistant alternation, which some provider APIs enforce.
|
||
|
||
The fixed anchor and the per-turn refresh split the cache cost fairly:
|
||
within a turn's tool loop the pinned blocks are byte-identical (continuations
|
||
replay the payload — pure appends, cache hits); the next `send()` re-reads
|
||
the files, and a change invalidates the cache only from the turn's anchor,
|
||
not from the system prefix. The materialized block's label states its capture
|
||
time ("content as of this turn") because a tool may mutate the file mid-loop;
|
||
the model sees such changes through the tool results themselves. Pinned
|
||
content is never stored in history and never persisted — never duplicated
|
||
turn-over-turn.
|
||
|
||
Invoked-skill bodies move the opposite way: out of the system layer into the
|
||
**user blocks of that turn** (a dedicated block type), so they persist in
|
||
history and survive the rest of the conversation (fixes problem 4).
|
||
|
||
### 4.2 Placement policy (single table, owned by the assembler)
|
||
|
||
| Content | Position in request | Why |
|
||
|---|---|---|
|
||
| `agent.system` (rendered TOML `system_prompt`) | system, first | static per agent → max cache reuse |
|
||
| `env.project`, `skills.catalog` | system, after agent | changes rarely |
|
||
| pipeline layers (`refactor`, `compression`, `completion.context`) | system, last | fresh session each time, ordering irrelevant |
|
||
| history | messages | as is |
|
||
| pinned materializations | text blocks prepended to the turn's typed user message, live content | fixed anchor keeps the prefix cache-stable; content refreshes because tools mutate files at any moment |
|
||
| task blocks | last user message | the turn itself |
|
||
|
||
`ClaudeCacheControl` breakpoints stay as they are (system / history tail);
|
||
this ordering is what makes them effective.
|
||
|
||
### 4.3 Tool continuations stay in LLMQore (replay)
|
||
|
||
The tool loop deliberately stays in LLMQore — the library is a complete,
|
||
standalone agentic client, and the loop (execute tools, count rounds,
|
||
schedule the next request, stream) is *mechanism*, which per
|
||
`target-architecture.md` design principle 3 belongs in C++ identically for
|
||
all providers. Continuation *content* is the library's default replay: the
|
||
base payload plus the assistant message and appended tool results.
|
||
|
||
An inversion hook (`setContinuationPayloadBuilder`, an optional per-request
|
||
callback letting `Session` re-assemble each continuation through
|
||
`ContextAssembler`) was implemented and **reverted 2026-06-12**: the problem
|
||
it solved was judged contrived. The replay already carries the full filtered
|
||
history of its base payload, mid-loop file changes reach the model through
|
||
the tool results themselves, and continuation growth within one turn is
|
||
bounded by `maxToolContinuations` — budget enforcement at `send()` time
|
||
covers the realistic cases. Consequences accepted with the revert: the
|
||
manifest logs one entry per `send()` (not per wire request), and pinned
|
||
content is byte-stable for the duration of a turn's tool loop (§4.1).
|
||
|
||
2026-06-13 the loop's *shape* inside LLMQore was refactored without changing
|
||
this decision (see `tool-loop-runner-plan.md`): the loop policy now lives in
|
||
`ToolLoopRunner` (per-request round state, limit, continuation decision) and
|
||
`BaseClient` slimmed to transport + tool dispatch with public primitives
|
||
`continueRequest` / `buildReplayContinuation` / `abortRequest`. Continuation
|
||
content is still the replay. QodeAssist sets the round limit via
|
||
`client->toolLoop()->setMaxRounds(...)`; the old `setMaxToolContinuations`
|
||
stays as a forwarder for compatibility.
|
||
|
||
### 4.4 Budget
|
||
|
||
`ContextAssembler` consults a `BudgetPolicy` before producing the context:
|
||
|
||
```
|
||
input_estimate = TokenEstimator(system + history + pinned + task)
|
||
limit = agent context_window − body.max_tokens (output reserve)
|
||
```
|
||
|
||
`context_window` comes from provider/model metadata with an optional agent
|
||
TOML override. When the estimate exceeds the limit the policy returns a trim
|
||
plan executed in deterministic order:
|
||
|
||
1. elide bodies of tool results older than the last N rounds
|
||
(`[tool result elided — N tokens]` placeholder, pairing preserved);
|
||
2. elide materializations of old stored images/attachments (placeholder
|
||
block, reference kept in history);
|
||
3. below a hard floor — refuse with `ErrorCategory::Validation` and surface
|
||
"compress the conversation" (ChatCompressor) in the UI.
|
||
|
||
v1.0 ships stages: **estimate + manifest + UI warning** first (no silent
|
||
trimming), then stage 1–2 elision, then auto-compression hooks. The
|
||
architecture fixes the *seam*; the policy can stay minimal.
|
||
|
||
`TokenEstimator` is calibrated per provider/model from `Usage` events
|
||
(§8.5 of the target architecture) — chars-per-token ratio updated after every
|
||
response; the chat token counter and the budget share this one estimator.
|
||
|
||
### 4.5 Materialization and caching
|
||
|
||
Stored content (attachments, images) stays reference-only in history;
|
||
materialization happens in the assembler through the `ContentLoader`. Two
|
||
fixes over today:
|
||
|
||
- the loader result is cached per `(storedPath, mtime, size)` — no re-reading
|
||
the whole conversation's binaries on every send, and byte-identical turns
|
||
keep the provider prompt cache warm;
|
||
- a failed load produces an **explicit placeholder block**
|
||
(`[attachment unavailable: name.png]`) instead of silently vanishing —
|
||
the model can say so, the manifest records it (fixes problem 5).
|
||
|
||
### 4.6 Observability: the context manifest
|
||
|
||
Every `assemble()` emits one debug-category log entry and a struct on the
|
||
event stream:
|
||
|
||
```
|
||
manifest {
|
||
layers: { agent.system: ~1.9k tok, env.project: ~70, skills.catalog: ~640 }
|
||
history: 26 messages, ~14.2k tok (3 tool rounds)
|
||
pinned: { linked:src/main.cpp: ~2.1k }
|
||
task: ~310 tok, 1 image (cached)
|
||
elided: [ tool_result a4f1 (~8k) ]
|
||
estimate: ~19.3k / limit 32k
|
||
}
|
||
```
|
||
|
||
Nothing is dropped silently — every filter (unsigned thinking, orphaned tool
|
||
pairs, failed loads, budget elisions) leaves a manifest record. The token
|
||
counter UI reads the same struct.
|
||
|
||
---
|
||
|
||
## 5. Wire contract — `ctx.*` stays, gains one producer
|
||
|
||
`Templates::ContextData` (→ `ctx.system_prompt`, `ctx.history`,
|
||
`ctx.prefix/suffix`, `ctx.files_metadata`) remains the contract between the
|
||
core and `[body]` templates — it is not legacy, it is the template-facing
|
||
view of the assembled context. The change is that exactly one function
|
||
produces it (`ContextAssembler::assemble`), for every request, and
|
||
`toLegacyContext`/`buildLegacyContext` are renamed into it. Existing
|
||
serialization rules carry over unchanged: system messages never enter
|
||
history, unsigned thinking is dropped, orphaned tool_use/tool_result pairs
|
||
are filtered, `CompletionContent` becomes `prefix`/`suffix`.
|
||
|
||
---
|
||
|
||
## 6. Migration plan
|
||
|
||
Ordered so every step lands independently and shrinks risk:
|
||
|
||
1. **Extract `ContextAssembler`** from `buildLegacyContext` (pure, unit-tested
|
||
against fixture histories) + manifest logging + failed-load placeholder
|
||
blocks. No behavior change otherwise. — DONE 2026-06-12
|
||
(`sources/Session/ContextAssembler.{hpp,cpp}`, `test/ContextAssemblerTest.cpp`;
|
||
manifest logged under the `qodeassist.context` category).
|
||
2. **ContentLoader cache** keyed by `(path, mtime, size)`. — DONE 2026-06-12
|
||
(`StoredContentCache` in `ChatSerializer`, owned per-chat by
|
||
`ClientInterface`, cleared on chat switch).
|
||
3. **Pinned providers**: linked files and open-files sync move out of the
|
||
`chat.context` system layer; invoked-skill bodies move into the turn's
|
||
user blocks. `chat.context` shrinks to project info + skills catalog.
|
||
— DONE 2026-06-12 (`Session::pinContext/unpinContext`, pinned splice in
|
||
`ContextAssembler::assemble`; `SkillInvocationContent` block persisted via
|
||
`MessageSerializer`, invisible in the chat UI by design; open-files sync is
|
||
covered because `ChatRootView` merges open editors into the linked list).
|
||
4. **Shared `EnvBlockFormatter`** in ContextEngine; chat/refactor/completion
|
||
stop hand-formatting project/file info. — DONE 2026-06-12
|
||
(`context/EnvBlockFormatter.{hpp,cpp}`: pure `formatProject`/`formatFile`
|
||
+ the `currentProject()` QtC gatherer; chat project block, refactor file
|
||
header, and completion's `getLanguageAndFileInfo` all route through it).
|
||
5. ~~**Continuation payload callback**~~ — REVERTED 2026-06-12 (implemented,
|
||
then judged a solution to a contrived problem; see §4.3). Continuations
|
||
are LLMQore's default replay; `ContextAssembler` runs once per `send()`.
|
||
6. **TokenEstimator + BudgetPolicy seam** — estimate + warning first, then
|
||
elision stages.
|
||
7. **ContextEngine port split** (delta #9 of the target architecture) —
|
||
`EditorContext` / `ProjectContext` / `TokenEstimator` behind ports, QtC
|
||
API only in `ide/context` adapters.
|
||
|
||
---
|
||
|
||
## 7. Open questions
|
||
|
||
1. ~~**Pinned placement**~~ — RESOLVED 2026-06-12: text blocks prepended to
|
||
the last user-role wire message (synthetic user message only when there is
|
||
none). A separate synthetic message would break strict role alternation on
|
||
some provider APIs; cache behaviour of the two shapes is identical.
|
||
2. ~~**Tool-loop relocation cost**~~ — RESOLVED 2026-06-12: relocation
|
||
rejected (LLMQore is deliberately a standalone agentic client). The
|
||
follow-up `setContinuationPayloadBuilder` inversion hook was also
|
||
implemented and reverted the same day — replay is the accepted behaviour
|
||
(§4.3).
|
||
3. **Budget v1 scope** — warn-only vs. enabling tool-result elision
|
||
immediately. Elision changes what the model sees; needs live validation.
|
||
4. **Completion and open files** — should completion gain pinned open-files
|
||
context (cheap with this design), or stay prefix/suffix-only for latency?
|