Model integrations

Reclio uses two model workers — one for chat completions, one for embeddings. They're configured independently so you can mix and match (e.g. chat on Claude for variety, embeddings on OpenAI for top-tier quality). See Embeddings for the full embedding-side breakdown; this page covers the chat side.

Five chat providers are supported via LLM_PROVIDER:

Provider	Best for	Cost	Setup
Ollama (default)	Self-hosting, no API fees	Free, local	Docker Compose brings it up for you
Claude	Highest-quality replies, hosted	Anthropic pricing	Set `ANTHROPIC_API_KEY`
OpenAI	Existing OpenAI plan	OpenAI pricing	Set `OPENAI_API_KEY`
OpenRouter ✨	One key → ~200 models (Claude, GPT, Llama, Gemini, Qwen, Mixtral…)	OpenRouter passthrough pricing	Set `OPENROUTER_API_KEY`
None	Minimal deployments	Free	Set `LLM_PROVIDER=none`

Switching is a single env var — no code changes, no rebuilds. The rest of Reclio doesn't care which model answers.

What the chat LLM is used for

Five places, in roughly increasing importance:

Because-You-Watched titles — "Ever venture out to space?" instead of bare "Because You Watched Interstellar". Cached 24h.
Personality summary — the playful 1-line roast at the top of the dashboard's "What Reclio thinks about you" card. Regenerated each taste-profile rebuild.
Ask Reclio chat — the floating bubble. Chat replies + intent classification (so "stop showing me horror" actually mutates your preferences).
Onboarding preference derivation — turns five free-form answers into a structured profile (mood tags, excluded genres, era prefs, family-safe flag, vibe summary). One-time per user.
Conversational mutations — "newer movies please" / "less action" / "never recommend Inception again" all flow through the classifier into structured preference updates.

Ollama (default)

Runs inside your Compose stack. On first boot it pulls llama3.2:3b (~2 GB). Subsequent starts are cached.

LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://ollama:11434
OLLAMA_MODEL=llama3.2:3b

Swap OLLAMA_MODEL for anything in the Ollama library — e.g. qwen2.5:3b, phi3:mini. Reclio calls POST /api/pull on startup so the model is ready before the first /feeds hit.

If you also want Ollama to serve embeddings, pull the embedding model separately:

docker compose exec reclio-ollama ollama pull nomic-embed-text

Claude

LLM_PROVIDER=claude
ANTHROPIC_API_KEY=sk-ant-...
CLAUDE_MODEL=claude-haiku-4-5

If ANTHROPIC_API_KEY is blank, Reclio silently falls back to the NullProvider. Anthropic ships no embedding models — so when chat is on Claude, embeddings auto-fall-back to local sentence-transformers unless you set EMBEDDING_PROVIDER=openai.

OpenAI

LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o-mini

When chat is on OpenAI, embeddings automatically use OpenAI text-embedding-3-small (1536d) too — same key, same vendor, top-tier embedding quality. This is the highest-quality single-vendor configuration available.

OpenRouter ✨ (added in v1.6)

OpenRouter is a router — one API key gets you access to ~200 chat models from every major vendor (Anthropic, OpenAI, Meta Llama, Google Gemini, Mistral, Qwen, DeepSeek, etc.).

LLM_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-v1-...
OPENROUTER_MODEL=anthropic/claude-3.5-haiku

Model names follow OpenRouter's vendor/model convention. Some notable picks:

Model	When
`anthropic/claude-3.5-haiku` (default)	Cheap, fast, good baseline
`anthropic/claude-3.5-sonnet`	Highest quality, ~10× the cost
`openai/gpt-4o-mini`	OpenAI alternative without an OpenAI key
`meta-llama/llama-3.3-70b-instruct`	Open weights, often very fast on Groq backends
`google/gemini-2.0-flash-exp:free`	Free tier, good for testing
`meta-llama/llama-3.2-3b-instruct:free`	Smallest free option

OpenRouter does not proxy embeddings. When LLM_PROVIDER=openrouter, embeddings fall back to local sentence-transformers. To get OpenAI embeddings alongside OpenRouter chat, set EMBEDDING_PROVIDER=openai

OPENAI_API_KEY. See Embeddings.

None

If you don't want any LLM calls at all:

LLM_PROVIDER=none

BYW row titles become straight f-strings ("Because You Watched Interstellar"). The Ask Reclio chat shows a "chat offline" state. Personality blurbs are skipped (the donut + bars still render). All LLM-dependent features degrade gracefully — /feeds still returns 10 valid rows.

Mix-and-match: chat ≠ embeddings

The two workers are configured independently:

# Chat on OpenRouter (model variety)
LLM_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-v1-...
OPENROUTER_MODEL=anthropic/claude-3.5-sonnet

# Embeddings on OpenAI (top-tier 1536d quality)
EMBEDDING_PROVIDER=openai
OPENAI_API_KEY=sk-...

Allowed EMBEDDING_PROVIDER values: auto (follows LLM_PROVIDER, default) · openai · ollama · local (sentence-transformers) · none. See Embeddings for the full provider matrix and quality-vs-cost comparison.

Prompt-injection mitigation

User-controlled strings (Trakt usernames, movie titles, free-form chat input) are passed through sanitize_for_prompt() before being interpolated into the prompt:

Control characters collapsed to spaces
Backslashes + quotes stripped
Length-capped per call (titles 120 char, chat 400-500 char)

If the LLM returns anything suspicious (empty, overly long, malformed JSON for the structured calls), Reclio discards it and uses the fallback path. The classifier in particular runs strict JSON parsing

field-level whitelisting before applying any preference mutations.

What the chat LLM is used for​

Ollama (default)​

Claude​

OpenAI​

OpenRouter ✨ (added in v1.6)​

None​

Mix-and-match: chat ≠ embeddings​

Prompt-injection mitigation​