Files
QodeAssist/docs/agent-templates-design.md
2026-06-09 08:48:32 +02:00

359 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Agent Templates — Design Note (body model, include, extends)
Status: agreed design / ready to implement. Dev-facing (not end-user docs).
Scope: how agent TOML profiles describe the request and share structure.
## Problem this replaces
The shipped model has each agent embed a `[template].message_format` jinja string
that hand-builds the **whole** request body as text, plus `[template.sampling]` and
`[template.thinking.*]` blocks merged in by `applySampling`. Pains:
- Massive copy-paste: 9 OpenAI-compatible agents share a byte-identical ~50-line
`message_format`; 4 Claude agents share another; `role` + README `context` are
identical across 18 files.
- `[template.sampling]` / `[template.thinking.overrides]` /
`[template.thinking.request_block.*]` describe **merge machinery**, not the request
body — they don't look like the actual API call. The `overrides` vs `request_block`
split is meaningless (both are deep-merged into the request identically).
- Manual JSON-by-string-concatenation: trailing-comma bookkeeping
(`{% if not loop.is_last %},{% endif %}`) everywhere; a missing comma fails
silently at runtime (`renderBody` returns nullopt, only a `qWarning`).
- `include` is hard-disabled, so there is no way to share a sub-fragment.
## Agreed model
### 1. `[body]` is a deep-mergeable table = the request body, 1:1 with the API
Replace the `message_format` string and the `sampling`/`thinking` blocks with a
single `[body]` TOML table whose keys are the **literal request-body fields**.
Because it is a table (not a string), `extends` / `deepMerge` can override it
field-by-field — variants become a 2-line delta instead of a copied body.
Field-value rules at build time (per key in `[body]`, applied recursively):
- **string containing jinja** (`{{` or `{%`) → render through inja, splice the
output as **raw JSON** (array / object / string). Empty render → key omitted.
- **string without jinja** (e.g. `"high"`) → literal JSON string, as-is.
- **number / bool / inline-table** → as-is.
So `messages` / `contents` and `system` / `system_instruction` are just **string
fields holding jinja**; everything else (`max_tokens`, `temperature`, `stream`,
`thinking`, `output_config`, `generationConfig`, …) is a literal value that reads
exactly like the curl body.
No runtime toggles: thinking / tools / streaming are **fixed per agent**. A thinking
agent literally carries the `thinking` fields; a non-thinking variant is a separate
file. There is no `{% if thinking %}` and no `thinkingEnabled` flag threaded into
rendering. `system` uses `{% if existsIn(ctx, "system_prompt") %}` only because that
is about *presence of data*, not a mode toggle.
Outside the body:
- `model` — supplied by the **client** from its own settings; never in the profile.
Google embeds the model in the URL, so its `endpoint` uses a `${MODEL}` placeholder
the client resolves (same substitution style as `${PROJECT_DIR}` / `${HOME}`).
- `tools` — injected by the **provider** when `enable_tools` is set (tool
definitions are dynamic, from `ToolsManager`; they can't be authored in TOML).
- `stream` — always on. Literal `"stream": true` in the body for OpenAI / Claude /
Mistral; encoded in the `endpoint` URL for Google.
### 2. `include` re-enabled as whitelisted partials
The message-array rendering (the complex, comma-heavy part) lives in
`sources/agents/partials/*.jinja`, shared via `{% include %}`. The throwing include
callback is replaced by a sandboxed resolver that:
- rejects names containing `..`, a leading `/`, or a scheme/drive;
- resolves only against known roots: bundled `:/agents/partials/` then the user
`partials/` dir;
- parses/caches the partial in the same `inja::Environment`.
A missing/typo'd partial is a **load-time** error.
### 3. `extends` shares config down a hierarchy
`extends` already exists (`resolveExtends` + `deepMerge` + `abstract`/`hidden`); it
keeps doing what it does, now over the structured `[body]` too. Typical 23 levels:
```
chat_base (abstract) → system_prompt (shared by all)
├─ openai_base (abstract) → provider/endpoint/enable_tools + [body]
│ ├─ openai_chat → name
│ ├─ mistral_chat → name, provider, endpoint
│ └─ mistral_reasoning → + [body].reasoning_effort
├─ anthropic_base (abstract) → provider/endpoint/thinking + [body]
│ ├─ claude_chat → name
│ └─ claude_sonnet → + [body.output_config].effort
└─ google_base (abstract) → provider/endpoint + [body]
└─ gemini_chat → name
```
Notes:
- `[body]` is shared whole when identical (the 8 OpenAI-compatible providers); a
variant overrides only the differing field — no duplicated body.
- Arrays (`tags`) are **replaced** on override, not appended (`deepMerge` recurses
objects only). A child that wants base tags + extras restates the full list.
- Division of labour: **include** shares the message-rendering fragment across
unrelated families; **extends** shares config (system_prompt / endpoint / body)
down one inheritance chain.
- With `model` gone, per-model files collapse: agents that previously differed only
by `model` become one agent (the client picks the model). A separate file is only
needed when the body genuinely differs (effort, no-thinking, …).
### `role` + `context` merged into `system_prompt`
The old `role` (static) and `context` (jinja, reads files) are two layers of the
same system prompt (`SystemPromptBuilder` layers `agent.role` / `agent.context`).
Merge into one `system_prompt` field, always rendered through `ContextRenderer`
(static text passes through; dynamic parts use `{% %}`), e.g. README via
`file_exists` instead of the `set readme / if length` dance. `Session` collapses the
two layers into one rendered layer.
## Worked examples
OpenAI base:
```toml
extends = "Chat Base"
abstract = true
provider_instance = "OpenAI (Chat Completions)"
endpoint = "/chat/completions"
enable_tools = true
[body]
max_tokens = 8192
temperature = 0.7
stream = true
messages = """
[ {% include "partials/openai_messages.jinja" %} ]
"""
```
Mistral reasoning child (delta only):
```toml
extends = "OpenAI Base Chat"
name = "Mistral Reasoning Chat"
provider_instance = "Mistral AI"
endpoint = "/v1/chat/completions"
enable_thinking = true
[body]
reasoning_effort = "medium"
```
Claude base (literally the curl body):
```toml
extends = "Chat Base"
abstract = true
provider_instance = "Claude"
endpoint = "/v1/messages"
enable_thinking = true
enable_tools = true
[body]
max_tokens = 16000
temperature = 1
stream = true
thinking = { type = "adaptive", display = "summarized" }
output_config = { effort = "high" }
system = """{% if existsIn(ctx, "system_prompt") %}{{ tojson(ctx.system_prompt) }}{% endif %}"""
messages = """
[ {% include "partials/anthropic_messages.jinja" %} ]
"""
```
Sonnet child (delta only):
```toml
extends = "Anthropic Base Chat"
name = "Claude Sonnet"
[body.output_config]
effort = "medium"
```
Google base (`${MODEL}` in endpoint; streaming in the URL):
```toml
extends = "Chat Base"
abstract = true
provider_instance = "Google AI"
endpoint = "/models/${MODEL}:streamGenerateContent?alt=sse"
enable_thinking = true
enable_tools = true
[body]
system_instruction = """{% if existsIn(ctx, "system_prompt") %}{ "parts": [ { "text": {{ tojson(ctx.system_prompt) }} } ] }{% endif %}"""
contents = """
[ {% include "partials/google_contents.jinja" %} ]
"""
[body.generationConfig]
maxOutputTokens = 16000
temperature = 1
thinkingConfig = { includeThoughts = true, thinkingBudget = 8192 }
```
### Partials
`partials/openai_messages.jinja` dispatches per message:
```jinja
{% if existsIn(ctx, "system_prompt") %}
{ "role": "system", "content": {{ tojson(ctx.system_prompt) }} },
{% endif %}
{% for msg in ctx.history %}
{% if msg.role == "assistant" %}{% include "partials/openai_assistant.jinja" %}
{% else if length(filter_by_type(msg.content_blocks, "tool_result")) > 0 %}{% include "partials/openai_tool_results.jinja" %}
{% else %}{% include "partials/openai_user.jinja" %}
{% endif %}
{% endfor %}
```
`partials/openai_assistant.jinja`:
```jinja
{% set tcalls = filter_by_type(msg.content_blocks, "tool_use") %}
{
"role": "assistant",
"content": {{ tojson(msg.content) }}
{% if length(tcalls) > 0 %}
, "tool_calls": [
{% for b in tcalls %}
{ "id": {{ tojson(b.id) }}, "type": "function",
"function": { "name": {{ tojson(b.name) }}, "arguments": {{ tojson(tojson(b.input)) }} } },
{% endfor %}
]
{% endif %}
},
```
`partials/openai_tool_results.jinja`:
```jinja
{% for b in filter_by_type(msg.content_blocks, "tool_result") %}
{ "role": "tool", "tool_call_id": {{ tojson(b.tool_use_id) }}, "content": {{ tojson(b.content) }} },
{% endfor %}
```
`partials/openai_user.jinja`:
```jinja
{% if existsIn(msg, "images") %}
{ "role": "user", "content": {% include "partials/openai_image_content.jinja" %} },
{% else %}
{ "role": "user", "content": {{ tojson(msg.content) }} },
{% endif %}
```
`partials/openai_image_content.jinja`:
```jinja
[
{ "type": "text", "text": {{ tojson(msg.content) }} }
{% for img in msg.images %}
,
{% if img.is_url %}
{ "type": "image_url", "image_url": { "url": {{ tojson(img.data) }} } }
{% else %}
{ "type": "image_url", "image_url": { "url": "data:{{ img.media_type }};base64,{{ img.data }}" } }
{% endif %}
{% endfor %}
]
```
`partials/anthropic_messages.jinja`:
```jinja
{% for msg in ctx.history %}
{
"role": {{ tojson(msg.role) }},
"content": [
{% for b in msg.content_blocks %}
{% if b.type == "image" %}{% include "partials/anthropic_image.jinja" %}
{% else %}{{ tojson(b) }},
{% endif %}
{% endfor %}
]
},
{% endfor %}
```
`partials/anthropic_image.jinja`:
```jinja
{
"type": "image",
"source":
{% if b.is_url %}
{ "type": "url", "url": {{ tojson(b.data) }} }
{% else %}
{ "type": "base64", "media_type": {{ tojson(b.media_type) }}, "data": {{ tojson(b.data) }} }
{% endif %}
},
```
`partials/google_contents.jinja`:
```jinja
{% for msg in ctx.history %}
{
"role": {% if msg.role == "assistant" %}"model"{% else %}"user"{% endif %},
"parts": [ {% for b in msg.content_blocks %}{% include "partials/google_part.jinja" %}{% endfor %} ]
},
{% endfor %}
```
`partials/google_part.jinja`:
```jinja
{% if b.type == "text" %}
{ "text": {{ tojson(b.text) }} },
{% else if b.type == "thinking" %}
{ "text": {{ tojson(b.thinking) }}, "thought": true, "thoughtSignature": {{ tojson(b.signature) }} },
{% else if b.type == "tool_use" %}
{ "functionCall": { "name": {{ tojson(b.name) }}, "args": {{ tojson(b.input) }} } },
{% else if b.type == "tool_result" %}
{ "functionResponse": { "name": {{ tojson(b.name) }}, "response": { "result": {{ tojson(b.content) }} } } },
{% else if b.type == "image" %}
{% if b.is_url %}
{ "file_data": { "mime_type": {{ tojson(b.media_type) }}, "file_uri": {{ tojson(b.data) }} } },
{% else %}
{ "inline_data": { "mime_type": {{ tojson(b.media_type) }}, "data": {{ tojson(b.data) }} } },
{% endif %}
{% else %}
{ "text": "" },
{% endif %}
```
## C++ work
In `JsonPromptTemplate`:
- Parse `[body]` as a `QJsonObject` (not a string). Walk it recursively and build the
request: render jinja-bearing string values via inja and splice the parsed JSON;
pass literal strings / scalars / inline-tables through; drop keys whose render is
empty.
- **Delete** `m_sampling`, `m_thinking`, and `applySampling` entirely — the body is
the request; there is no separate sampling/thinking merge.
- Drop the `thinkingEnabled` parameter from `buildFullRequest` /
`Provider::prepareRequest` / `Session` — it no longer affects rendering.
- Add a **JSON-aware** trailing-comma stripper before `QJsonDocument::fromJson`
(tracks string/escape state so `,}` / `,]` inside string values are not touched).
This is what lets partials emit an unconditional `,` after every element and drop
all `loop.is_last` bookkeeping.
In `AgentConfig` / `AgentLoader`:
- Replace `messageFormat` (string) with `body` (`QJsonObject`); merge `role` +
`context` into `system_prompt`. `[template].sampling` / `[template].thinking` are
removed.
- `extends` / `deepMerge` are unchanged; they now also merge `[body]`.
- Validate at load: a referenced partial must resolve; the assembled body must parse
as JSON (render once against a synthetic context with tool_use / tool_result /
image). Catches breakage at startup, not mid-conversation.
In the client / provider layer:
- The client sets `model` from its settings (and resolves `${MODEL}` in the
endpoint); `Session` no longer seeds the payload with `cfg.model`.
- The provider keeps injecting `tools` when `enable_tools` is set.
In `Session`:
- Collapse the `agent.role` + `agent.context` system-prompt layers into one rendered
`system_prompt` layer.
## Implementation order
1. JSON-aware trailing-comma stripper + whitelisted `include` resolver (enables
readable partials).
2. `[body]`-table model in `JsonPromptTemplate` + loader; delete
sampling/thinking/`applySampling`; drop `thinkingEnabled`.
3. `system_prompt` merge in loader + `Session`.
4. `model` from client (+ `${MODEL}` endpoint substitution); convert bundled agents
to the base/partials/`extends` layout.
5. Load-time validation (partial resolves, body parses).