OllamaClient
Category: LLM Providers
Source: ollama_client.dart
Classes
OllamaClient
LLM client for Ollama local API with streaming.
Ollama uses NDJSON streaming (not SSE) and its own message format. Tool calling uses OpenAI-compatible tool schemas but returns arguments as parsed objects (not JSON strings).
num_ctx injection. Ollama silently defaults to num_ctx: 2048 for every request regardless of what the model was trained with — a notorious footgun that silently truncates agent loops. When [contextWindow] is set, we inject options.num_ctx = min(contextWindow, ollamaNumCtxCeiling) so catalogued models get the full context their metadata promises.
Constructor
dart
OllamaClient({
required this.model,
required this.systemPrompt,
String baseUrl = 'http://localhost:11434',
this.contextWindow,
http.Client Function()? requestClientFactory,
})Properties
| Property | Type | Description |
|---|---|---|
model | String | |
systemPrompt | String | |
contextWindow | int? | When non-null, injected as options.num_ctx on every request. Comes from ModelDef.contextWindow at adapter construction time. See class doc for why this matters. |
Methods
Stream<LlmChunk> stream(List<Message> messages, {List<Tool>? tools})
`static Stream<LlmChunk> parseStreamEvents(
Stream<Map<String, dynamic>> events,
)`
Parse Ollama NDJSON streaming events into [LlmChunk]s.
Constants
| Name | Type | Description |
|---|---|---|
ollamaNumCtxCeiling | int | Hard ceiling on the num_ctx override Glue will send. Keeps us from forwarding absurd context windows (some catalogue entries claim 1M+) that would blow past the user's RAM budget on mid-range GPUs. 128K is comfortably above every real agent conversation and matches what the upstream ecosystem (Continue, Cline, opencode) settled on. Exposed publicly so tests can assert it without magic-number copies. |