Configure for Ollama

Install Ollama. Make sure to review the system requirements before installation.
Install a language models in Ollama via terminal. For example, you can run:

For standard computers (minimum 8GB RAM):

ollama run qwen2.5-coder:7b

For better performance (16GB+ RAM):

ollama run qwen2.5-coder:14b

For high-end systems (32GB+ RAM):

ollama run qwen2.5-coder:32b

Open Qt Creator settings (Edit > Preferences on Linux/Windows, Qt Creator > Preferences on macOS)
Navigate to the "QodeAssist" tab
On the "General" page, verify:
- Ollama is selected as your LLM provider
- The URL is set to http://localhost:11434
- Your installed model appears in the model selection
- The prompt template is Ollama Auto FIM or Ollama Auto Chat for chat assistance. You can specify template if it is not work correct
- Disable using tools if your model doesn't support tooling
Click Apply if you made any changes

You're all set! QodeAssist is now ready to use in Qt Creator.

Extended Thinking Mode

Ollama supports extended thinking mode for models that are capable of deep reasoning (such as DeepSeek-R1, QwQ, and similar reasoning models). This mode allows the model to show its step-by-step reasoning process before providing the final answer.

How to Enable

For Chat Assistant:

Navigate to Qt Creator > Preferences > QodeAssist > Chat Assistant
In the "Extended Thinking (Claude, Ollama)" section, check "Enable extended thinking mode"
Select a reasoning-capable model (e.g., deepseek-r1:8b, qwq:32b)
Click Apply

For Quick Refactoring:

Navigate to Qt Creator > Preferences > QodeAssist > Quick Refactor
Check "Enable Thinking Mode"
Configure thinking budget and max tokens as needed
Click Apply

Supported Models

Thinking mode works best with models specifically designed for reasoning:

DeepSeek-R1 series (deepseek-r1:8b, deepseek-r1:14b, deepseek-r1:32b)
QwQ series (qwq:32b)
Other models trained for chain-of-thought reasoning

How It Works

When thinking mode is enabled:

The model generates internal reasoning (visible in the chat as "Thinking" blocks)
After reasoning, it provides the final answer
You can collapse/expand thinking blocks to focus on the final answer
Temperature is automatically set to 1.0 for optimal reasoning performance

Technical Details:

Thinking mode adds the enable_thinking: true parameter to requests sent to Ollama
This is natively supported by the Ollama API for compatible models
Works in both Chat Assistant and Quick Refactoring contexts

Example of Ollama settings: (click to expand)

2.8 KiB Raw Blame History

Configure for Ollama

Extended Thinking Mode

How to Enable

Supported Models

How It Works

2.8 KiB

Raw Blame History