Pattern: CQRS with CDC-Driven Read Models¶

Quick facts

Category: Backend & Distributed Systems
Maturity: Trial
Typical team size: 3-5 engineers
Typical timeline to MVP: 6-10 weeks (first projection); 1-3 weeks per additional projection
Last reviewed: 2026-05-19 by Architecture Team

1. Context¶

Use this pattern when:

Read and write access patterns are fundamentally different — writes go to a normalised, relational system of record; reads require denormalised, aggregated, or cross-source views that are too expensive to compute at query time
The read path must not compete with the write path for database resources — a slow reporting query must never delay a payment posting
Near-real-time read views are required from a database that does not emit application-level domain events (legacy core banking systems, packaged software, third-party systems)
Multiple upstream source systems contribute to a single unified read view (Customer 360, consolidated position, regulatory report)

Do NOT use this pattern when:

Simple read replicas are sufficient — if the read shape matches the write shape and eventual consistency is the only requirement, a PostgreSQL read replica is far simpler and has no operational overhead
The write volume is low and a database view or materialised view covers the read requirement — CQRS+CDC adds infrastructure complexity that only pays off under real read/write contention or format divergence
Strong consistency is required between write and read — CDC introduces replication lag (typically milliseconds to seconds); if the UI must reflect a write the instant it is made, this pattern requires extra care (see Section 10)

How this pattern relates to others in this category

Question	Pattern to reach for
Application-level domain events drive downstream consumers?	Event-Driven Architecture — use EDA when you own the producer and can emit clean domain events; use CQRS+CDC when the source is a database you cannot change (legacy system, packaged software)
The event log is the system of record (no separate write DB)?	Event Sourcing — CQRS+CDC keeps a traditional database as the write side; Event Sourcing replaces the write DB with an append-only event log
Read views are rebuilt from a replayable event stream you already have?	CQRS+CDC is a superset — CDC produces a replayable changelog; if you already have Kafka with domain events, project from those instead of adding a CDC pipeline
Read/write split for a single service with no legacy constraint?	A PostgreSQL read replica or materialised view is simpler and should be tried first

2. Problem it solves¶

A core banking system records transactions in a heavily normalised relational schema optimised for double-entry bookkeeping correctness. Displaying a customer's transaction history with merchant name, category, running balance, and search requires joining seven tables, which takes 400 ms at scale and competes with posting traffic. The bank cannot change the core banking schema — it is a vendor product. CQRS with CDC solves this by treating the core system's transaction log as an event stream: Debezium captures every row change from the database write-ahead log without touching application code, a projection service consumes those changes and maintains a purpose-built read store (Elasticsearch, DynamoDB) in near-real time, and the mobile app reads exclusively from the projection — never from the core.

3. Solution overview¶

System context (C4 Level 1)¶

flowchart LR
    App([Client App]) -->|commands: post, transfer| WriteDB[(System of Record\nCore DB — normalised)]
    WriteDB -->|WAL / binlog via CDC| Kafka[Kafka\nCDC event stream]
    Kafka --> Projection[Projection Service\nmaintains read model]
    Projection --> ReadStore[(Read Store\ndenormalised / search-optimised)]
    App -->|queries: history, balance, search| ReadStore

Container view (C4 Level 2)¶

flowchart TB
    subgraph Write["Write Side"]
        CoreDB[(Core DB\nPostgreSQL / Oracle)]
        Debezium[Debezium Connector\nWAL capture]
    end

    subgraph Bus["Event Bus"]
        Kafka[Kafka\nCDC topic per table]
        SchemaReg[Schema Registry\nAvro / Protobuf]
    end

    subgraph Projections["Projection Services"]
        TxnProj[Transaction History\nProjection]
        Bal360Proj[Customer 360\nProjection]
        PosProj[Position\nProjection]
    end

    subgraph ReadStores["Read Stores"]
        ES[Elasticsearch\ntransaction search]
        Dynamo[DynamoDB\nkey-value balance lookup]
        Redis[Redis\nhot position cache]
    end

    subgraph Ops
        DLQ[Dead-letter topic\nfailed projection events]
        Monitor[Projection lag monitor\nalert if lag > threshold]
        Snapshot[Snapshot store\nperiodic full projection dump]
    end

    CoreDB -->|WAL| Debezium
    Debezium -->|row-level change events| Kafka
    Kafka --> SchemaReg
    Kafka --> TxnProj
    Kafka --> Bal360Proj
    Kafka --> PosProj
    TxnProj --> ES
    Bal360Proj --> Dynamo
    PosProj --> Redis
    TxnProj -->|on failure| DLQ
    Bal360Proj -->|on failure| DLQ
    PosProj -->|on failure| DLQ
    DLQ --> Monitor
    Kafka --> Monitor
    TxnProj --> Snapshot

4. Technology stack¶

Layer	Primary choice	Alternatives	Notes
CDC connector	Debezium	AWS DMS (for RDS sources), Maxwell (MySQL only), Oracle GoldenGate	Debezium is the open-source standard; supports PostgreSQL WAL, MySQL binlog, Oracle LogMiner, SQL Server CDC; runs as a Kafka Connect connector — deploy on the Kafka Connect cluster you already have
Source databases	PostgreSQL (WAL level = logical)	MySQL, Oracle, SQL Server, MongoDB	PostgreSQL requires `wal_level = logical` — verify this is permitted in your managed DB offering (RDS, Cloud SQL, Azure SQL all support it)
CDC event bus	Apache Kafka	AWS MSK (managed Kafka), Confluent Cloud	Kafka is the natural sink for Debezium; topic-per-table is the default; use `tombstone` records for deletes
Schema format	Avro + Schema Registry	Protobuf, JSON	Debezium natively integrates with Confluent Schema Registry; Avro captures the full before/after row state with schema evolution
Projection runtime	Go (custom consumer)	Kafka Streams, Apache Flink, Spring Cloud Stream	Go for simple projections; Kafka Streams for stateful aggregations (running balance, position); Flink for complex event processing across multiple CDC streams
Read stores	Elasticsearch (search/history) + DynamoDB (key-value lookup)	Redis (hot cache), PostgreSQL (secondary relational), OpenSearch	Choose per projection requirement — Elasticsearch for full-text and faceted search; DynamoDB for sub-10ms key-value; Redis for real-time hot data; never use a single read store for all projection types
Snapshot / bootstrap	Debezium initial snapshot	Custom bulk-load script	Debezium performs an initial snapshot of the source table before streaming live changes; configure `snapshot.mode` and ensure the snapshot does not impact production read traffic
Observability	Consumer lag (Prometheus + Kafka exporter) + OpenTelemetry	Datadog Kafka integration	Projection lag is the primary health signal — alert immediately if any projection falls more than N seconds behind the write side

5. Non-functional characteristics¶

Concern	Profile
Consistency model	Eventually consistent by design. CDC lag is typically milliseconds to low seconds under normal load; spikes during batch writes or rebalances can extend this. Design the UI to handle stale reads gracefully: show a "as of [timestamp]" indicator, or use optimistic UI updates for the writing client (see Section 10).
Scalability	Write side scales independently of the read side. Each projection service scales by adding Kafka consumer instances (bounded by partition count). Multiple independent projections can consume the same CDC topic without affecting each other or the write side.
Availability target	Write side availability is unchanged — CDC capture is asynchronous and does not add to the write path. A Debezium connector failure pauses projections but does not affect writes; projections catch up automatically when the connector recovers. Read stores target 99.9%+ independently of the write side.
Latency target	Write path: unchanged from pre-CQRS baseline. Read path: sub-10ms for DynamoDB key-value; sub-100ms for Elasticsearch queries. Projection propagation lag (write to read store): p95 < 2s under normal load.
Security posture	The CDC stream carries raw database rows including PII. Apply field-level masking in the Debezium connector or in the projection service before the data reaches the read store. The Kafka CDC topic must have strict ACLs — it is more sensitive than application-level events because it contains unfiltered DB state.
Compliance fit	GDPR — the read store is a derived copy; erasure requests must propagate to all projection read stores as well as the write side. A `customer.deleted` CDC event (or a dedicated erasure pipeline) must trigger projection cleanup. SOC 2 — the CDC stream provides a timestamped audit trail of every row change. PCI-DSS — card data appearing in CDC events must be tokenised before the event is written to Kafka; the projection service must never store raw PANs.

6. Cost ballpark¶

Indicative monthly USD cost. Debezium runs on Kafka Connect (shared with EDA infrastructure where applicable).

Scale	CDC events / day	Monthly cost	Cost drivers
Small	< 1M	$400 - $1,200	Kafka Connect cluster (Debezium), 3-node Kafka, 1-2 projection services, Elasticsearch single-node
Medium	1M - 50M	$1,500 - $7,000	Larger Kafka cluster, Elasticsearch cluster, DynamoDB on-demand, projection service fleet, lag monitoring
Large	50M+	$8,000 - $35,000	Multi-region Kafka with MirrorMaker, Elasticsearch multi-shard cluster, DynamoDB provisioned capacity, Flink for complex projections

7. LLM-assisted development fit¶

Aspect	Rating	Notes
Debezium connector configuration	★★★★	Good — connector JSON config for PostgreSQL and MySQL is well-represented; verify `snapshot.mode`, `publication.name`, and slot management settings manually
Kafka consumer / projection service boilerplate	★★★★★	Excellent — consuming Kafka and upserting into Elasticsearch or DynamoDB is well-handled
Projection rebuild logic (replay from snapshot)	★★★	Gets the concept right; snapshot + replay ordering edge cases require human design and testing
Schema evolution handling (before/after Avro envelope)	★★★	Understands the Debezium envelope format; field rename and table schema change scenarios need manual verification
Architecture decisions	★	Don't outsource. Use ADRs. The choice of read store per projection type and the consistency model have long-term implications.

Recommended workflow: Start with a single table and a single projection before adding more sources. Validate the initial snapshot is correct before enabling live streaming. Test the projection rebuild path (drop and replay) before going to production — it is the path you will rely on after any read store incident.

8. Reference implementations¶

Public reference: microservices.io/patterns/data/cqrs — Chris Richardson's CQRS pattern documentation covering command/query separation, eventual consistency trade-offs, and implementation options including CDC (200 OK ✓)
Public reference: github.com/debezium/debezium — the Debezium CDC framework; debezium-connector-postgres/ and debezium-connector-mysql/ show the WAL/binlog capture implementation; examples directory covers common configuration patterns (200 OK ✓)
Public reference: learn.microsoft.com — CQRS pattern — Microsoft Azure Architecture Center reference covering the CQRS pattern with consistency trade-offs and implementation guidance (200 OK ✓)
Public reference: martinfowler.com/bliki/CQRS.html — Martin Fowler's foundational CQRS definition; seminal reference despite the 2011 date — the core separation principle has not changed (200 OK ✓)
Public reference: github.com/oskardudycz/EventSourcing.NetCore — .NET reference implementation covering CQRS, projections, and read model patterns with practical examples (200 OK ✓)
Internal case studies: Digital banking — transaction history view and Customer 360 (see below)

Internal case study — Transaction history view: core banking to Elasticsearch projection¶

A retail mobile banking app displays the last 90 days of transactions with merchant name, category icon, running balance, and full-text search. The core banking system stores transactions in a normalised schema across six tables; assembling the view required a 7-table join that took 380 ms at p95 and caused read IOPS spikes that correlated with posting delays during salary credit days.

What changed

Debezium was deployed against the core banking PostgreSQL instance (wal_level = logical, replication slot per connector). A Kafka topic receives a change event for every row insert or update in the transactions, postings, and merchant tables. A projection service consumes these events and upserts a denormalised transaction document into Elasticsearch — one document per transaction, pre-joined with merchant name, category, and running balance.

flowchart LR
    CoreDB[(Core Banking\nPostgreSQL)] -->|WAL — logical replication| Debezium[Debezium\nConnector]
    Debezium -->|txn.posted\nmerchant.updated| Kafka[Kafka]
    Kafka --> TxnProj[Transaction\nProjection Service]
    TxnProj -->|upsert document| ES[(Elasticsearch\nper-customer index)]
    MobileApp([Mobile App]) -->|GET /transactions?q=...| ES
    MobileApp -->|POST /transfer| CoreDB

Outcomes

Metric	Before	After
Transaction history query p95	380 ms (7-table join)	18 ms (Elasticsearch document fetch)
Core DB read IOPS during salary batch	+340% spike	No impact (reads served from ES)
Search capability	None (exact match only)	Full-text, date range, amount range, category filter
Projection propagation lag (p95)	N/A	1.1 s end-to-end from core posting to ES

Gotchas observed

Replication slot grew unbounded during a Debezium outage — when the connector was down for 4 hours (Kafka Connect restart), the PostgreSQL replication slot retained all WAL segments since the last confirmed LSN. The WAL directory grew to 40 GB before the connector recovered. Mitigation: set max_slot_wal_keep_size in PostgreSQL and alert when replication slot lag exceeds a threshold; have a runbook for slot drop and projection rebuild.
Merchant table updates triggered a full re-projection — a bulk merchant category update (50,000 rows) flooded the projection service with update events; the lag spiked to 45 s. Mitigation: rate-limit non-critical update sources in the projection service; merchant enrichment data should be applied at query time via a lookup, not embedded in every transaction document.
Initial snapshot locked a table briefly — Debezium's default snapshot mode (initial) takes a brief shared lock on each table; during the transactions table snapshot this caused a 2-second posting delay. Resolved by using snapshot.mode = initial_only on a replica rather than the primary, and switching to never mode after the first successful snapshot.

Internal case study — Customer 360: multi-source CDC to unified read store¶

A bank's customer data lived in four systems: core banking (account data), CRM (contact preferences, relationship manager), KYC platform (identity documents, risk rating), and the fraud system (alert flags). No single system had a complete picture. Customer-facing staff used four separate screens; any update in one system was invisible to the others for up to 24 hours (batch sync jobs).

Each source system had a CDC connector writing to a dedicated Kafka topic. A Customer 360 projection service consumed all four topics, merged the data on customerId, and maintained a unified customer document in DynamoDB. Staff-facing tools read exclusively from the DynamoDB projection.