AI / LLM-Integrated Systems¶
Systems whose primary value comes from a Large Language Model or other foundation model in the loop. The model is treated as a software component with its own engineering discipline.
When to look here¶
- The product feature would be impossible (or absurdly expensive) without an LLM
- Natural language is the primary input, output, or both
- The system needs to reason over unstructured documents
Patterns in this category¶
| Pattern | Status | Best for |
|---|---|---|
| RAG (Retrieval-Augmented Generation) | Adopt | Q&A over a private corpus, knowledge assistants |
| Conversational Assistant / Chatbot | Adopt | Customer support, internal help desk |
| Agentic Workflow | Trial | Multi-step tasks where the model uses tools |
| Document Processing / Extraction | Adopt | Invoices, contracts, forms; replaces traditional OCR + rules |
| Copilot / In-app Assistant | Trial | Embedded assistance within an existing product |
| Fine-tuned Domain Model | Assess | High-volume narrow tasks where prompt engineering plateaus |
Default tech stack¶
- Model providers: Anthropic (Claude), OpenAI, Google (Gemini), AWS Bedrock (multi-vendor)
- Orchestration: LangGraph, Vercel AI SDK, or hand-rolled (often the right answer per Armin Ronacher)
- Vector store: Postgres + pgvector (default), Pinecone, Weaviate, Qdrant
- Evaluation: Braintrust, LangSmith, custom eval harness with
pytest - Observability: Langfuse, Helicone, Datadog LLM Observability
- Guardrails: Provider-side moderation + custom output validation; assume hallucination will occur
LLM-development fit¶
A strange case: LLMs are good at writing code that calls other LLMs, but the resulting system inherits all the unpredictability problems. Build evaluation harnesses before scaling. Design every workflow assuming the model will return wrong output some percentage of the time.