refactor: Finalize agent template

2026-06-14 02:09:22 -04:00 · 2026-06-03 17:28:50 +02:00
parent 98a618cf87
commit c151c5030b
57 changed files with 1737 additions and 393 deletions
--- a/docs/agent-templates-design.md
+++ b/docs/agent-templates-design.md
@@ -0,0 +1,358 @@
+# Agent Templates — Design Note (body model, include, extends)
+
+Status: agreed design / ready to implement. Dev-facing (not end-user docs).
+Scope: how agent TOML profiles describe the request and share structure.
+
+## Problem this replaces
+
+The shipped model has each agent embed a `[template].message_format` jinja string
+that hand-builds the **whole** request body as text, plus `[template.sampling]` and
+`[template.thinking.*]` blocks merged in by `applySampling`. Pains:
+
+- Massive copy-paste: 9 OpenAI-compatible agents share a byte-identical ~50-line
+  `message_format`; 4 Claude agents share another; `role` + README `context` are
+  identical across 18 files.
+- `[template.sampling]` / `[template.thinking.overrides]` /
+  `[template.thinking.request_block.*]` describe **merge machinery**, not the request
+  body — they don't look like the actual API call. The `overrides` vs `request_block`
+  split is meaningless (both are deep-merged into the request identically).
+- Manual JSON-by-string-concatenation: trailing-comma bookkeeping
+  (`{% if not loop.is_last %},{% endif %}`) everywhere; a missing comma fails
+  silently at runtime (`renderBody` returns nullopt, only a `qWarning`).
+- `include` is hard-disabled, so there is no way to share a sub-fragment.
+
+## Agreed model
+
+### 1. `[body]` is a deep-mergeable table = the request body, 1:1 with the API
+
+Replace the `message_format` string and the `sampling`/`thinking` blocks with a
+single `[body]` TOML table whose keys are the **literal request-body fields**.
+Because it is a table (not a string), `extends` / `deepMerge` can override it
+field-by-field — variants become a 2-line delta instead of a copied body.
+
+Field-value rules at build time (per key in `[body]`, applied recursively):
+- **string containing jinja** (`{{` or `{%`) → render through inja, splice the
+  output as **raw JSON** (array / object / string). Empty render → key omitted.
+- **string without jinja** (e.g. `"high"`) → literal JSON string, as-is.
+- **number / bool / inline-table** → as-is.
+
+So `messages` / `contents` and `system` / `system_instruction` are just **string
+fields holding jinja**; everything else (`max_tokens`, `temperature`, `stream`,
+`thinking`, `output_config`, `generationConfig`, …) is a literal value that reads
+exactly like the curl body.
+
+No runtime toggles: thinking / tools / streaming are **fixed per agent**. A thinking
+agent literally carries the `thinking` fields; a non-thinking variant is a separate
+file. There is no `{% if thinking %}` and no `thinkingEnabled` flag threaded into
+rendering. `system` uses `{% if existsIn(ctx, "system_prompt") %}` only because that
+is about *presence of data*, not a mode toggle.
+
+Outside the body:
+- `model` — supplied by the **client** from its own settings; never in the profile.
+  Google embeds the model in the URL, so its `endpoint` uses a `${MODEL}` placeholder
+  the client resolves (same substitution style as `${PROJECT_DIR}` / `${HOME}`).
+- `tools` — injected by the **provider** when `enable_tools` is set (tool
+  definitions are dynamic, from `ToolsManager`; they can't be authored in TOML).
+- `stream` — always on. Literal `"stream": true` in the body for OpenAI / Claude /
+  Mistral; encoded in the `endpoint` URL for Google.
+
+### 2. `include` re-enabled as whitelisted partials
+
+The message-array rendering (the complex, comma-heavy part) lives in
+`sources/agents/partials/*.jinja`, shared via `{% include %}`. The throwing include
+callback is replaced by a sandboxed resolver that:
+- rejects names containing `..`, a leading `/`, or a scheme/drive;
+- resolves only against known roots: bundled `:/agents/partials/` then the user
+  `partials/` dir;
+- parses/caches the partial in the same `inja::Environment`.
+
+A missing/typo'd partial is a **load-time** error.
+
+### 3. `extends` shares config down a hierarchy
+
+`extends` already exists (`resolveExtends` + `deepMerge` + `abstract`/`hidden`); it
+keeps doing what it does, now over the structured `[body]` too. Typical 2–3 levels:
+
+```
+chat_base (abstract)            → system_prompt (shared by all)
+  ├─ openai_base (abstract)     → provider/endpoint/enable_tools + [body]
+  │    ├─ openai_chat           → name
+  │    ├─ mistral_chat          → name, provider, endpoint
+  │    └─ mistral_reasoning      → + [body].reasoning_effort
+  ├─ anthropic_base (abstract)  → provider/endpoint/thinking + [body]
+  │    ├─ claude_chat           → name
+  │    └─ claude_sonnet          → + [body.output_config].effort
+  └─ google_base (abstract)     → provider/endpoint + [body]
+       └─ gemini_chat           → name
+```
+
+Notes:
+- `[body]` is shared whole when identical (the 8 OpenAI-compatible providers); a
+  variant overrides only the differing field — no duplicated body.
+- Arrays (`tags`) are **replaced** on override, not appended (`deepMerge` recurses
+  objects only). A child that wants base tags + extras restates the full list.
+- Division of labour: **include** shares the message-rendering fragment across
+  unrelated families; **extends** shares config (system_prompt / endpoint / body)
+  down one inheritance chain.
+- With `model` gone, per-model files collapse: agents that previously differed only
+  by `model` become one agent (the client picks the model). A separate file is only
+  needed when the body genuinely differs (effort, no-thinking, …).
+
+### `role` + `context` merged into `system_prompt`
+
+The old `role` (static) and `context` (jinja, reads files) are two layers of the
+same system prompt (`SystemPromptBuilder` layers `agent.role` / `agent.context`).
+Merge into one `system_prompt` field, always rendered through `ContextRenderer`
+(static text passes through; dynamic parts use `{% %}`), e.g. README via
+`file_exists` instead of the `set readme / if length` dance. `Session` collapses the
+two layers into one rendered layer.
+
+## Worked examples
+
+OpenAI base:
+```toml
+extends = "Chat Base"
+abstract = true
+provider_instance = "OpenAI (Chat Completions)"
+endpoint = "/chat/completions"
+enable_tools = true
+
+[body]
+max_tokens  = 8192
+temperature = 0.7
+stream      = true
+messages    = """
+[ {% include "partials/openai_messages.jinja" %} ]
+"""
+```
+
+Mistral reasoning child (delta only):
+```toml
+extends = "OpenAI Base Chat"
+name    = "Mistral Reasoning Chat"
+provider_instance = "Mistral AI"
+endpoint = "/v1/chat/completions"
+enable_thinking = true
+
+[body]
+reasoning_effort = "medium"
+```
+
+Claude base (literally the curl body):
+```toml
+extends = "Chat Base"
+abstract = true
+provider_instance = "Claude"
+endpoint = "/v1/messages"
+enable_thinking = true
+enable_tools = true
+
+[body]
+max_tokens  = 16000
+temperature = 1
+stream      = true
+thinking      = { type = "adaptive", display = "summarized" }
+output_config = { effort = "high" }
+system   = """{% if existsIn(ctx, "system_prompt") %}{{ tojson(ctx.system_prompt) }}{% endif %}"""
+messages = """
+[ {% include "partials/anthropic_messages.jinja" %} ]
+"""
+```
+
+Sonnet child (delta only):
+```toml
+extends = "Anthropic Base Chat"
+name    = "Claude Sonnet"
+
+[body.output_config]
+effort = "medium"
+```
+
+Google base (`${MODEL}` in endpoint; streaming in the URL):
+```toml
+extends = "Chat Base"
+abstract = true
+provider_instance = "Google AI"
+endpoint = "/models/${MODEL}:streamGenerateContent?alt=sse"
+enable_thinking = true
+enable_tools = true
+
+[body]
+system_instruction = """{% if existsIn(ctx, "system_prompt") %}{ "parts": [ { "text": {{ tojson(ctx.system_prompt) }} } ] }{% endif %}"""
+contents = """
+[ {% include "partials/google_contents.jinja" %} ]
+"""
+
+[body.generationConfig]
+maxOutputTokens = 16000
+temperature     = 1
+thinkingConfig  = { includeThoughts = true, thinkingBudget = 8192 }
+```
+
+### Partials
+
+`partials/openai_messages.jinja` dispatches per message:
+```jinja
+{% if existsIn(ctx, "system_prompt") %}
+{ "role": "system", "content": {{ tojson(ctx.system_prompt) }} },
+{% endif %}
+{% for msg in ctx.history %}
+  {% if msg.role == "assistant" %}{% include "partials/openai_assistant.jinja" %}
+  {% else if length(filter_by_type(msg.content_blocks, "tool_result")) > 0 %}{% include "partials/openai_tool_results.jinja" %}
+  {% else %}{% include "partials/openai_user.jinja" %}
+  {% endif %}
+{% endfor %}
+```
+
+`partials/openai_assistant.jinja`:
+```jinja
+{% set tcalls = filter_by_type(msg.content_blocks, "tool_use") %}
+{
+  "role": "assistant",
+  "content": {{ tojson(msg.content) }}
+  {% if length(tcalls) > 0 %}
+  , "tool_calls": [
+    {% for b in tcalls %}
+    { "id": {{ tojson(b.id) }}, "type": "function",
+      "function": { "name": {{ tojson(b.name) }}, "arguments": {{ tojson(tojson(b.input)) }} } },
+    {% endfor %}
+  ]
+  {% endif %}
+},
+```
+
+`partials/openai_tool_results.jinja`:
+```jinja
+{% for b in filter_by_type(msg.content_blocks, "tool_result") %}
+{ "role": "tool", "tool_call_id": {{ tojson(b.tool_use_id) }}, "content": {{ tojson(b.content) }} },
+{% endfor %}
+```
+
+`partials/openai_user.jinja`:
+```jinja
+{% if existsIn(msg, "images") %}
+{ "role": "user", "content": {% include "partials/openai_image_content.jinja" %} },
+{% else %}
+{ "role": "user", "content": {{ tojson(msg.content) }} },
+{% endif %}
+```
+
+`partials/openai_image_content.jinja`:
+```jinja
+[
+  { "type": "text", "text": {{ tojson(msg.content) }} }
+  {% for img in msg.images %}
+  ,
+  {% if img.is_url %}
+  { "type": "image_url", "image_url": { "url": {{ tojson(img.data) }} } }
+  {% else %}
+  { "type": "image_url", "image_url": { "url": "data:{{ img.media_type }};base64,{{ img.data }}" } }
+  {% endif %}
+  {% endfor %}
+]
+```
+
+`partials/anthropic_messages.jinja`:
+```jinja
+{% for msg in ctx.history %}
+{
+  "role": {{ tojson(msg.role) }},
+  "content": [
+    {% for b in msg.content_blocks %}
+      {% if b.type == "image" %}{% include "partials/anthropic_image.jinja" %}
+      {% else %}{{ tojson(b) }},
+      {% endif %}
+    {% endfor %}
+  ]
+},
+{% endfor %}
+```
+
+`partials/anthropic_image.jinja`:
+```jinja
+{
+  "type": "image",
+  "source":
+  {% if b.is_url %}
+  { "type": "url", "url": {{ tojson(b.data) }} }
+  {% else %}
+  { "type": "base64", "media_type": {{ tojson(b.media_type) }}, "data": {{ tojson(b.data) }} }
+  {% endif %}
+},
+```
+
+`partials/google_contents.jinja`:
+```jinja
+{% for msg in ctx.history %}
+{
+  "role": {% if msg.role == "assistant" %}"model"{% else %}"user"{% endif %},
+  "parts": [ {% for b in msg.content_blocks %}{% include "partials/google_part.jinja" %}{% endfor %} ]
+},
+{% endfor %}
+```
+
+`partials/google_part.jinja`:
+```jinja
+{% if b.type == "text" %}
+{ "text": {{ tojson(b.text) }} },
+{% else if b.type == "thinking" %}
+{ "text": {{ tojson(b.thinking) }}, "thought": true, "thoughtSignature": {{ tojson(b.signature) }} },
+{% else if b.type == "tool_use" %}
+{ "functionCall": { "name": {{ tojson(b.name) }}, "args": {{ tojson(b.input) }} } },
+{% else if b.type == "tool_result" %}
+{ "functionResponse": { "name": {{ tojson(b.name) }}, "response": { "result": {{ tojson(b.content) }} } } },
+{% else if b.type == "image" %}
+  {% if b.is_url %}
+  { "file_data": { "mime_type": {{ tojson(b.media_type) }}, "file_uri": {{ tojson(b.data) }} } },
+  {% else %}
+  { "inline_data": { "mime_type": {{ tojson(b.media_type) }}, "data": {{ tojson(b.data) }} } },
+  {% endif %}
+{% else %}
+{ "text": "" },
+{% endif %}
+```
+
+## C++ work
+
+In `JsonPromptTemplate`:
+- Parse `[body]` as a `QJsonObject` (not a string). Walk it recursively and build the
+  request: render jinja-bearing string values via inja and splice the parsed JSON;
+  pass literal strings / scalars / inline-tables through; drop keys whose render is
+  empty.
+- **Delete** `m_sampling`, `m_thinking`, and `applySampling` entirely — the body is
+  the request; there is no separate sampling/thinking merge.
+- Drop the `thinkingEnabled` parameter from `buildFullRequest` /
+  `Provider::prepareRequest` / `Session` — it no longer affects rendering.
+- Add a **JSON-aware** trailing-comma stripper before `QJsonDocument::fromJson`
+  (tracks string/escape state so `,}` / `,]` inside string values are not touched).
+  This is what lets partials emit an unconditional `,` after every element and drop
+  all `loop.is_last` bookkeeping.
+
+In `AgentConfig` / `AgentLoader`:
+- Replace `messageFormat` (string) with `body` (`QJsonObject`); merge `role` +
+  `context` into `system_prompt`. `[template].sampling` / `[template].thinking` are
+  removed.
+- `extends` / `deepMerge` are unchanged; they now also merge `[body]`.
+- Validate at load: a referenced partial must resolve; the assembled body must parse
+  as JSON (render once against a synthetic context with tool_use / tool_result /
+  image). Catches breakage at startup, not mid-conversation.
+
+In the client / provider layer:
+- The client sets `model` from its settings (and resolves `${MODEL}` in the
+  endpoint); `Session` no longer seeds the payload with `cfg.model`.
+- The provider keeps injecting `tools` when `enable_tools` is set.
+
+In `Session`:
+- Collapse the `agent.role` + `agent.context` system-prompt layers into one rendered
+  `system_prompt` layer.
+
+## Implementation order
+
+1. JSON-aware trailing-comma stripper + whitelisted `include` resolver (enables
+   readable partials).
+2. `[body]`-table model in `JsonPromptTemplate` + loader; delete
+   sampling/thinking/`applySampling`; drop `thinkingEnabled`.
+3. `system_prompt` merge in loader + `Session`.
+4. `model` from client (+ `${MODEL}` endpoint substitution); convert bundled agents
+   to the base/partials/`extends` layout.
+5. Load-time validation (partial resolves, body parses).