ADR-0006: Anthropic Claude as the default LLM provider¶
- Status: Accepted
- Date: 2026-05-02
- Deciders: Architecture Team
- Tags: ai, llm, provider, anthropic, openai
Context¶
Every LLM-integrated pattern in this catalog requires choosing a model provider. The choice affects: output quality on different task types, cost per token, latency, data retention and compliance posture, API ergonomics, and model availability under load. Changing providers after a product ships requires rewriting prompts (prompt sensitivity differs across models), re-evaluating outputs, and potentially re-embedding corpora.
The realistic candidates for a production system in 2026 are Anthropic (Claude family), OpenAI (GPT-4o family), Google (Gemini family), and AWS Bedrock (multi-vendor managed). Self-hosted open weights (Llama, Mistral) are covered by the Fine-tuned Domain Model pattern.
Decision¶
We will use Anthropic Claude as the default provider, with the following model tier policy:
| Use case | Default model | Notes |
|---|---|---|
| Complex reasoning, agentic tasks | Claude Sonnet 4.6 | Best tool-use accuracy and extended thinking |
| Interactive generation (chatbot, copilot) | Claude Haiku 4.5 | Fastest and cheapest; sufficient for most chat |
| Document extraction (vision) | Claude Sonnet 4.6 | Multimodal accuracy justifies the cost |
| Batch / offline processing | Claude Haiku 4.5 via Batch API | 50% cost reduction vs synchronous API |
Teams may use OpenAI or Google models for specific features where evaluation demonstrates a material quality advantage on that task. AWS Bedrock is recommended as the access layer for multi-provider strategies or when cloud-consolidated billing is required.
Alternatives considered¶
Option A: Anthropic Claude (chosen)¶
- Pros: Strongest instruction-following on complex, multi-constraint prompts. Industry-leading tool use accuracy (critical for agentic workflows). Extended thinking (Claude 3.5+ Sonnet) for tasks requiring step-by-step reasoning. Prompt caching with
cache_controlblocks reduces cost by up to 90% and latency by ~30% on repeat-context patterns (chatbots, copilots). Business Associate Agreements (BAA) available for HIPAA. EU data residency via Bedrock. Context window up to 200k tokens. Nativeinstructor-compatible structured output via tool use. - Cons: No image generation capability (no equivalent to DALL-E). OpenAI's embedding models (
text-embedding-3-small) are better-established for RAG pipelines — we use them for embeddings regardless of this decision. Slightly higher cost than GPT-4o-mini for equivalent quality tasks.
Option B: OpenAI (GPT-4o family)¶
- Pros: Largest installed base — most third-party tooling targets OpenAI first. Best-in-class embedding models (
text-embedding-3-small,text-embedding-3-large). Strong multimodal (image + text) performance. JSON mode and structured output viaresponse_format. DALL-E for image generation. The OpenAI API is the de-facto standard interface that vLLM, LiteLLM, and local inference servers emulate. - Cons: Data retention policy has historically been less transparent than Anthropic's; enterprise tier required for zero-retention guarantees. No equivalent to Anthropic's prompt caching for reducing repeat-context costs. GPT-4o instruction-following on complex, multi-constraint prompts is measurably weaker than Claude Sonnet on internal evaluations. Rate limits on shared tier are less predictable.
Option C: Google Gemini (1.5 / 2.0 family)¶
- Pros: Largest context window (1M tokens for Gemini 1.5 Pro) is useful for whole-codebase or whole-document tasks. Native Google Workspace integration. Competitive pricing. Gemini 2.0 Flash is the fastest model in its class.
- Cons: Gemini API stability has been less consistent than Anthropic or OpenAI in our experience. Tool use accuracy lags Claude Sonnet on complex agentic benchmarks. Weaker European data residency story compared to Anthropic/Bedrock. Less mature third-party ecosystem.
Option D: AWS Bedrock (multi-vendor)¶
- Pros: Single AWS billing and IAM integration across Claude, Titan, Mistral, and Llama models. No direct internet call to provider APIs — traffic stays within AWS VPC. Useful for organisations with AWS-consolidated compliance requirements. Guardrails feature (content filtering) at the infrastructure layer.
- Cons: Adds latency vs direct API calls (~20–50 ms). Model availability on Bedrock lags direct provider availability (new Claude models appear on Bedrock weeks after the direct API). Harder to test locally — requires AWS credentials even for development.
Consequences¶
Positive
- Single provider for most patterns reduces the cognitive overhead of maintaining multiple API keys, rate limit quotas, and billing relationships
- Prompt caching materially reduces cost and latency for chatbot and copilot patterns — not available from all providers
- BAA availability makes Anthropic viable for HIPAA use cases without switching providers per use case
- Extended thinking enables complex reasoning in agentic workflows without prompt-engineering workarounds
Negative / trade-offs
- OpenAI embedding models (
text-embedding-3-small) remain the default for RAG pipelines — this means two provider relationships when RAG is involved - Prompt sensitivity differs between Claude and GPT — prompts written for one model require testing when migrated to the other; never assume portability
- Vendor dependency: if Anthropic degrades service or changes pricing significantly, migrating all prompts is a multi-week engineering effort
Follow-up actions
- Abstract the LLM client behind a thin interface (
LLMClientwithcomplete()andstream()) so provider swaps require a one-file change - Establish a Bedrock account as a fallback access path for Claude models (maintains provider choice while gaining AWS billing integration)
- Re-evaluate this decision annually: the LLM provider landscape changes faster than any other technology in this catalog
- Document the BAA process with Anthropic for any new project handling HIPAA-regulated data