Skip to content

OllamaClient

Category: LLM Providers

Source: ollama_client.dart

Classes

OllamaClient

LLM client for Ollama local API with streaming.

Ollama uses NDJSON streaming (not SSE) and its own message format. Tool calling uses OpenAI-compatible tool schemas but returns arguments as parsed objects (not JSON strings).

num_ctx injection. Ollama silently defaults to num_ctx: 2048 for every request regardless of what the model was trained with — a notorious footgun that silently truncates agent loops. When [contextWindow] is set, we inject options.num_ctx = min(contextWindow, ollamaNumCtxCeiling) so catalogued models get the full context their metadata promises.

Constructor

dart
OllamaClient({
    required this.model,
    required this.systemPrompt,
    String baseUrl = 'http://localhost:11434',
    this.contextWindow,
    http.Client Function()? requestClientFactory,
  })

Properties

PropertyTypeDescription
modelString
systemPromptString
contextWindowint?When non-null, injected as options.num_ctx on every request. Comes from ModelDef.contextWindow at adapter construction time. See class doc for why this matters.

Methods

Stream<LlmChunk> stream(List<Message> messages, {List<Tool>? tools})
`static Stream<LlmChunk> parseStreamEvents(
Stream&lt;Map&lt;String, dynamic&gt;&gt; events,

)`

Parse Ollama NDJSON streaming events into [LlmChunk]s.

Constants

NameTypeDescription
ollamaNumCtxCeilingintHard ceiling on the num_ctx override Glue will send. Keeps us from forwarding absurd context windows (some catalogue entries claim 1M+) that would blow past the user's RAM budget on mid-range GPUs. 128K is comfortably above every real agent conversation and matches what the upstream ecosystem (Continue, Cline, opencode) settled on. Exposed publicly so tests can assert it without magic-number copies.

Released under the MIT License.