Pattern: CQRS with CDC-Driven Read Models¶
Quick facts
- Category: Backend & Distributed Systems
- Maturity: Trial
- Typical team size: 3-5 engineers
- Typical timeline to MVP: 6-10 weeks (first projection); 1-3 weeks per additional projection
- Last reviewed: 2026-05-19 by Architecture Team
1. Context¶
Use this pattern when:
- Read and write access patterns are fundamentally different — writes go to a normalised, relational system of record; reads require denormalised, aggregated, or cross-source views that are too expensive to compute at query time
- The read path must not compete with the write path for database resources — a slow reporting query must never delay a payment posting
- Near-real-time read views are required from a database that does not emit application-level domain events (legacy core banking systems, packaged software, third-party systems)
- Multiple upstream source systems contribute to a single unified read view (Customer 360, consolidated position, regulatory report)
Do NOT use this pattern when:
- Simple read replicas are sufficient — if the read shape matches the write shape and eventual consistency is the only requirement, a PostgreSQL read replica is far simpler and has no operational overhead
- The write volume is low and a database view or materialised view covers the read requirement — CQRS+CDC adds infrastructure complexity that only pays off under real read/write contention or format divergence
- Strong consistency is required between write and read — CDC introduces replication lag (typically milliseconds to seconds); if the UI must reflect a write the instant it is made, this pattern requires extra care (see Section 10)
How this pattern relates to others in this category
| Question | Pattern to reach for |
|---|---|
| Application-level domain events drive downstream consumers? | Event-Driven Architecture — use EDA when you own the producer and can emit clean domain events; use CQRS+CDC when the source is a database you cannot change (legacy system, packaged software) |
| The event log is the system of record (no separate write DB)? | Event Sourcing — CQRS+CDC keeps a traditional database as the write side; Event Sourcing replaces the write DB with an append-only event log |
| Read views are rebuilt from a replayable event stream you already have? | CQRS+CDC is a superset — CDC produces a replayable changelog; if you already have Kafka with domain events, project from those instead of adding a CDC pipeline |
| Read/write split for a single service with no legacy constraint? | A PostgreSQL read replica or materialised view is simpler and should be tried first |
2. Problem it solves¶
A core banking system records transactions in a heavily normalised relational schema optimised for double-entry bookkeeping correctness. Displaying a customer's transaction history with merchant name, category, running balance, and search requires joining seven tables, which takes 400 ms at scale and competes with posting traffic. The bank cannot change the core banking schema — it is a vendor product. CQRS with CDC solves this by treating the core system's transaction log as an event stream: Debezium captures every row change from the database write-ahead log without touching application code, a projection service consumes those changes and maintains a purpose-built read store (Elasticsearch, DynamoDB) in near-real time, and the mobile app reads exclusively from the projection — never from the core.
3. Solution overview¶
System context (C4 Level 1)¶
flowchart LR
App([Client App]) -->|commands: post, transfer| WriteDB[(System of Record\nCore DB — normalised)]
WriteDB -->|WAL / binlog via CDC| Kafka[Kafka\nCDC event stream]
Kafka --> Projection[Projection Service\nmaintains read model]
Projection --> ReadStore[(Read Store\ndenormalised / search-optimised)]
App -->|queries: history, balance, search| ReadStore
Container view (C4 Level 2)¶
flowchart TB
subgraph Write["Write Side"]
CoreDB[(Core DB\nPostgreSQL / Oracle)]
Debezium[Debezium Connector\nWAL capture]
end
subgraph Bus["Event Bus"]
Kafka[Kafka\nCDC topic per table]
SchemaReg[Schema Registry\nAvro / Protobuf]
end
subgraph Projections["Projection Services"]
TxnProj[Transaction History\nProjection]
Bal360Proj[Customer 360\nProjection]
PosProj[Position\nProjection]
end
subgraph ReadStores["Read Stores"]
ES[Elasticsearch\ntransaction search]
Dynamo[DynamoDB\nkey-value balance lookup]
Redis[Redis\nhot position cache]
end
subgraph Ops
DLQ[Dead-letter topic\nfailed projection events]
Monitor[Projection lag monitor\nalert if lag > threshold]
Snapshot[Snapshot store\nperiodic full projection dump]
end
CoreDB -->|WAL| Debezium
Debezium -->|row-level change events| Kafka
Kafka --> SchemaReg
Kafka --> TxnProj
Kafka --> Bal360Proj
Kafka --> PosProj
TxnProj --> ES
Bal360Proj --> Dynamo
PosProj --> Redis
TxnProj -->|on failure| DLQ
Bal360Proj -->|on failure| DLQ
PosProj -->|on failure| DLQ
DLQ --> Monitor
Kafka --> Monitor
TxnProj --> Snapshot
4. Technology stack¶
| Layer | Primary choice | Alternatives | Notes |
|---|---|---|---|
| CDC connector | Debezium | AWS DMS (for RDS sources), Maxwell (MySQL only), Oracle GoldenGate | Debezium is the open-source standard; supports PostgreSQL WAL, MySQL binlog, Oracle LogMiner, SQL Server CDC; runs as a Kafka Connect connector — deploy on the Kafka Connect cluster you already have |
| Source databases | PostgreSQL (WAL level = logical) | MySQL, Oracle, SQL Server, MongoDB | PostgreSQL requires wal_level = logical — verify this is permitted in your managed DB offering (RDS, Cloud SQL, Azure SQL all support it) |
| CDC event bus | Apache Kafka | AWS MSK (managed Kafka), Confluent Cloud | Kafka is the natural sink for Debezium; topic-per-table is the default; use tombstone records for deletes |
| Schema format | Avro + Schema Registry | Protobuf, JSON | Debezium natively integrates with Confluent Schema Registry; Avro captures the full before/after row state with schema evolution |
| Projection runtime | Go (custom consumer) | Kafka Streams, Apache Flink, Spring Cloud Stream | Go for simple projections; Kafka Streams for stateful aggregations (running balance, position); Flink for complex event processing across multiple CDC streams |
| Read stores | Elasticsearch (search/history) + DynamoDB (key-value lookup) | Redis (hot cache), PostgreSQL (secondary relational), OpenSearch | Choose per projection requirement — Elasticsearch for full-text and faceted search; DynamoDB for sub-10ms key-value; Redis for real-time hot data; never use a single read store for all projection types |
| Snapshot / bootstrap | Debezium initial snapshot | Custom bulk-load script | Debezium performs an initial snapshot of the source table before streaming live changes; configure snapshot.mode and ensure the snapshot does not impact production read traffic |
| Observability | Consumer lag (Prometheus + Kafka exporter) + OpenTelemetry | Datadog Kafka integration | Projection lag is the primary health signal — alert immediately if any projection falls more than N seconds behind the write side |
5. Non-functional characteristics¶
| Concern | Profile |
|---|---|
| Consistency model | Eventually consistent by design. CDC lag is typically milliseconds to low seconds under normal load; spikes during batch writes or rebalances can extend this. Design the UI to handle stale reads gracefully: show a "as of [timestamp]" indicator, or use optimistic UI updates for the writing client (see Section 10). |
| Scalability | Write side scales independently of the read side. Each projection service scales by adding Kafka consumer instances (bounded by partition count). Multiple independent projections can consume the same CDC topic without affecting each other or the write side. |
| Availability target | Write side availability is unchanged — CDC capture is asynchronous and does not add to the write path. A Debezium connector failure pauses projections but does not affect writes; projections catch up automatically when the connector recovers. Read stores target 99.9%+ independently of the write side. |
| Latency target | Write path: unchanged from pre-CQRS baseline. Read path: sub-10ms for DynamoDB key-value; sub-100ms for Elasticsearch queries. Projection propagation lag (write to read store): p95 < 2s under normal load. |
| Security posture | The CDC stream carries raw database rows including PII. Apply field-level masking in the Debezium connector or in the projection service before the data reaches the read store. The Kafka CDC topic must have strict ACLs — it is more sensitive than application-level events because it contains unfiltered DB state. |
| Compliance fit | GDPR — the read store is a derived copy; erasure requests must propagate to all projection read stores as well as the write side. A customer.deleted CDC event (or a dedicated erasure pipeline) must trigger projection cleanup. SOC 2 — the CDC stream provides a timestamped audit trail of every row change. PCI-DSS — card data appearing in CDC events must be tokenised before the event is written to Kafka; the projection service must never store raw PANs. |
6. Cost ballpark¶
Indicative monthly USD cost. Debezium runs on Kafka Connect (shared with EDA infrastructure where applicable).
| Scale | CDC events / day | Monthly cost | Cost drivers |
|---|---|---|---|
| Small | < 1M | $400 - $1,200 | Kafka Connect cluster (Debezium), 3-node Kafka, 1-2 projection services, Elasticsearch single-node |
| Medium | 1M - 50M | $1,500 - $7,000 | Larger Kafka cluster, Elasticsearch cluster, DynamoDB on-demand, projection service fleet, lag monitoring |
| Large | 50M+ | $8,000 - $35,000 | Multi-region Kafka with MirrorMaker, Elasticsearch multi-shard cluster, DynamoDB provisioned capacity, Flink for complex projections |
7. LLM-assisted development fit¶
| Aspect | Rating | Notes |
|---|---|---|
| Debezium connector configuration | ★★★★ | Good — connector JSON config for PostgreSQL and MySQL is well-represented; verify snapshot.mode, publication.name, and slot management settings manually |
| Kafka consumer / projection service boilerplate | ★★★★★ | Excellent — consuming Kafka and upserting into Elasticsearch or DynamoDB is well-handled |
| Projection rebuild logic (replay from snapshot) | ★★★ | Gets the concept right; snapshot + replay ordering edge cases require human design and testing |
| Schema evolution handling (before/after Avro envelope) | ★★★ | Understands the Debezium envelope format; field rename and table schema change scenarios need manual verification |
| Architecture decisions | ★ | Don't outsource. Use ADRs. The choice of read store per projection type and the consistency model have long-term implications. |
Recommended workflow: Start with a single table and a single projection before adding more sources. Validate the initial snapshot is correct before enabling live streaming. Test the projection rebuild path (drop and replay) before going to production — it is the path you will rely on after any read store incident.
8. Reference implementations¶
- Public reference: microservices.io/patterns/data/cqrs — Chris Richardson's CQRS pattern documentation covering command/query separation, eventual consistency trade-offs, and implementation options including CDC (200 OK ✓)
- Public reference: github.com/debezium/debezium — the Debezium CDC framework;
debezium-connector-postgres/anddebezium-connector-mysql/show the WAL/binlog capture implementation; examples directory covers common configuration patterns (200 OK ✓) - Public reference: learn.microsoft.com — CQRS pattern — Microsoft Azure Architecture Center reference covering the CQRS pattern with consistency trade-offs and implementation guidance (200 OK ✓)
- Public reference: martinfowler.com/bliki/CQRS.html — Martin Fowler's foundational CQRS definition; seminal reference despite the 2011 date — the core separation principle has not changed (200 OK ✓)
- Public reference: github.com/oskardudycz/EventSourcing.NetCore — .NET reference implementation covering CQRS, projections, and read model patterns with practical examples (200 OK ✓)
- Internal case studies: Digital banking — transaction history view and Customer 360 (see below)
Internal case study — Transaction history view: core banking to Elasticsearch projection¶
A retail mobile banking app displays the last 90 days of transactions with merchant name, category icon, running balance, and full-text search. The core banking system stores transactions in a normalised schema across six tables; assembling the view required a 7-table join that took 380 ms at p95 and caused read IOPS spikes that correlated with posting delays during salary credit days.
What changed
Debezium was deployed against the core banking PostgreSQL instance (wal_level = logical, replication slot per connector). A Kafka topic receives a change event for every row insert or update in the transactions, postings, and merchant tables. A projection service consumes these events and upserts a denormalised transaction document into Elasticsearch — one document per transaction, pre-joined with merchant name, category, and running balance.
flowchart LR
CoreDB[(Core Banking\nPostgreSQL)] -->|WAL — logical replication| Debezium[Debezium\nConnector]
Debezium -->|txn.posted\nmerchant.updated| Kafka[Kafka]
Kafka --> TxnProj[Transaction\nProjection Service]
TxnProj -->|upsert document| ES[(Elasticsearch\nper-customer index)]
MobileApp([Mobile App]) -->|GET /transactions?q=...| ES
MobileApp -->|POST /transfer| CoreDB
Outcomes
| Metric | Before | After |
|---|---|---|
| Transaction history query p95 | 380 ms (7-table join) | 18 ms (Elasticsearch document fetch) |
| Core DB read IOPS during salary batch | +340% spike | No impact (reads served from ES) |
| Search capability | None (exact match only) | Full-text, date range, amount range, category filter |
| Projection propagation lag (p95) | N/A | 1.1 s end-to-end from core posting to ES |
Gotchas observed
- Replication slot grew unbounded during a Debezium outage — when the connector was down for 4 hours (Kafka Connect restart), the PostgreSQL replication slot retained all WAL segments since the last confirmed LSN. The WAL directory grew to 40 GB before the connector recovered. Mitigation: set
max_slot_wal_keep_sizein PostgreSQL and alert when replication slot lag exceeds a threshold; have a runbook for slot drop and projection rebuild. - Merchant table updates triggered a full re-projection — a bulk merchant category update (50,000 rows) flooded the projection service with update events; the lag spiked to 45 s. Mitigation: rate-limit non-critical update sources in the projection service; merchant enrichment data should be applied at query time via a lookup, not embedded in every transaction document.
- Initial snapshot locked a table briefly — Debezium's default snapshot mode (
initial) takes a brief shared lock on each table; during the transactions table snapshot this caused a 2-second posting delay. Resolved by usingsnapshot.mode = initial_onlyon a replica rather than the primary, and switching tonevermode after the first successful snapshot.
Internal case study — Customer 360: multi-source CDC to unified read store¶
A bank's customer data lived in four systems: core banking (account data), CRM (contact preferences, relationship manager), KYC platform (identity documents, risk rating), and the fraud system (alert flags). No single system had a complete picture. Customer-facing staff used four separate screens; any update in one system was invisible to the others for up to 24 hours (batch sync jobs).
Each source system had a CDC connector writing to a dedicated Kafka topic. A Customer 360 projection service consumed all four topics, merged the data on customerId, and maintained a unified customer document in DynamoDB. Staff-facing tools read exclusively from the DynamoDB projection.
Outcomes
| Metric | Before | After |
|---|---|---|
| Time for CRM update to appear in core banking view | Up to 24 h (batch) | < 3 s (CDC propagation) |
| Screens required for full customer view | 4 | 1 |
| Fraud alert visibility during a call | Next business day | Real-time in the same view |
Gotchas observed
- Conflicting updates from two source systems — CRM and KYC both stored a customer email address; they disagreed 3% of the time. The projection service needed an explicit merge rule (KYC wins for identity fields; CRM wins for contact preference fields). Without the merge rule, the last-write-wins default produced non-deterministic results. Mitigation: define a field-level source-of-truth map before building the projection; make it a formal schema decision, not a code comment.
- Customer deletion was not propagated — a GDPR erasure request deleted the record in core banking; the CDC
deletetombstone was processed correctly in Kafka, but the projection service had a bug where tombstones were silently ignored. The customer document persisted in DynamoDB for 6 days. Mitigation: test the delete/tombstone path explicitly before launch; add a periodic reconciliation check that flags projection documents whose source record no longer exists.
9. Related decisions (ADRs)¶
- ADR-0001: Tenant isolation via PostgreSQL Row-Level Security — CDC events contain tenant-scoped row data; projection services must apply the same tenant isolation guarantees as the write side
- Candidate ADR: Debezium vs AWS DMS as the CDC connector — record when your organisation makes a committed infrastructure decision
- Candidate ADR: Read store selection per projection type (Elasticsearch vs DynamoDB vs Redis) — the choice has long-term operational and cost implications
10. Known risks & gotchas¶
- Replication slot accumulation during connector downtime — a stopped or lagging Debezium connector holds a PostgreSQL replication slot open; the database retains all WAL segments since the last confirmed position, filling disk. Mitigation: monitor replication slot lag as a first-class database health metric; set
max_slot_wal_keep_size; have a tested runbook for slot drop and full projection rebuild. - The read-after-write consistency problem — a customer submits a transfer; the UI immediately queries the transaction history projection; the CDC event has not yet propagated; the new transaction is invisible; the customer assumes the transfer failed and submits again. Mitigation: use optimistic UI updates (show the transaction immediately on the client before CDC confirms it); or use a short read-your-writes cache keyed on session; or add a
?min_sequence=Nparameter to the read API that waits for the projection to reach at least that sequence number. - Schema change on the source table breaks the CDC stream — a DBA adds a NOT NULL column to the transactions table without updating the Debezium schema or the projection service; the connector fails to deserialise the new row format and stops. Mitigation: treat source table schema changes as a deployment event that requires coordinating the CDC connector, Schema Registry, and projection service; enforce this via a change-management gate in CI.
- Tombstone events (deletes) silently ignored — Kafka compaction can remove tombstone events before a slow consumer processes them; or a projection service simply does not handle the
op: dDebezium envelope correctly. Result: deleted rows persist in the read store indefinitely — a compliance risk for GDPR erasure. Mitigation: test the delete path explicitly; run periodic reconciliation jobs that check for projection documents whose write-side record no longer exists. - Projection diverges from source after a partial rebuild — a projection rebuild replays events from a snapshot; if the snapshot was taken mid-transaction or the event ordering is incorrect, the rebuilt projection disagrees with the source. Mitigation: always rebuild from an Debezium-managed snapshot (not a manual DB dump); validate the rebuilt projection against the source before switching read traffic.
- Multiple projections consuming the same topic create fan-out lag — ten projection services on the same high-volume CDC topic each add consumer lag independently; a slow projection does not affect others, but monitoring becomes noisy. Mitigation: one consumer group per projection service; set independent lag alert thresholds per projection based on its SLA; do not share a consumer group across projections with different SLAs.