Project Name
Sub-Second Data Sync Across Core Banking Systems: Eliminating Batch Latency with Kafka CDC
![]()
A mid-size US retail bank was running a nightly batch job that swept transaction records from Oracle DB, loan data from SQL Server, account updates from PostgreSQL, and CRM records from MySQL into a central warehouse.
The process took hours, left every downstream system operating on data that was 12 to 18 hours stale, and placed significant query load on the same production databases handling millions of live customer transactions. When source table schemas changed, downstream consumers broke silently. When a batch run failed, recovery required a full manual restart with no visibility into which records had been lost.
Ksolves replaced the entire batch window with a Debezium-driven CDC pipeline that propagates every row-level change from all four source databases to analytics and operational systems within seconds, with zero modification to any production system.
- Production Databases Absorbing Analytical Query Load: Reporting tools and dashboards were issuing direct read queries against live Oracle DB and PostgreSQL instances, competing with customer-facing transaction processing for database resources, causing latency spikes during peak hours, and creating a structural risk of production degradation at month-end reporting cycles.
- 12 to 18 Hour Data Lag Across All Downstream Systems: Nightly batch jobs meant every dashboard, API, and regulatory report operated on data that was at minimum half a day old, making intra-day liquidity monitoring, real-time fraud signal generation, and live customer balance checks operationally impossible.
- Schema Changes Breaking Downstream Consumers Without Warning: When any source database table was restructured, there was no schema contract governing downstream consumers. CDC topics and ETL jobs failed silently or produced corrupted output, discovered only at the next reporting cycle, with no governed recovery process.
- No Point-in-Time Audit Trail for Regulatory Compliance: The batch-overwrite warehouse model retained no history of row-level data changes across core banking tables. Regulatory requirements for data lineage, change history, and point-in-time reconstruction could not be satisfied by the existing architecture.
- Raw CDC Events Requiring Multi-Table Joins Before Analytical Use: Change events captured at the individual table level were insufficient for analytics consumers. A complete transaction record requires joining transaction, account, and customer data residing in separate source systems with no enrichment layer between event capture and consumption.
- No Dead-Letter Recovery for Failed Pipeline Events: Batch pipeline failures required a full job restart with no visibility into which records had failed and no governed mechanism to reprocess only the affected events.
Ksolves designed and delivered a production-grade, event-driven CDC pipeline that replaces the nightly batch ETL with continuous, sub-second change propagation from four production source databases through Apache Kafka to enriched serving layers powering both analytics dashboards and operational APIs. The governing principle was zero-touch source systems: Debezium reads exclusively from database transaction logs, imposing no query load on production and requiring no schema modifications to any source table.
- Debezium CDC Connector Deployment: Debezium source connectors were configured for Oracle DB (LogMiner), PostgreSQL (pgoutput), SQL Server (CDC tables), and MySQL (binlog), capturing every INSERT, UPDATE, and DELETE as a structured Avro change event and publishing to dedicated Apache Kafka topics. Production systems receive zero additional read load at any point in the pipeline.
- Apache Kafka Cluster on AWS MSK: A 3-node Apache Kafka cluster was deployed on AWS MSK with SASL/SSL authentication and KRaft mode, with one topic per source table, 7-day event retention for replay, and Kafka Connect managing both source ingestion and downstream sink delivery. A dead-letter queue topic captures and retains all failed events for governed, targeted reprocessing.
- Schema Registry with Avro Backward-Compatibility Enforcement: Schema Registry was implemented with Avro serialisation and backward-compatibility enforcement across all CDC topics, ensuring that any source table schema change surfaces immediately as a compatibility failure, preventing silent downstream corruption across all consuming applications.
- Kafka Streams Enrichment Topology: A Kafka Streams join topology was built that combines transaction, account, and customer change events from separate topics into pre-joined, analytics-ready records in real time, delivering enriched data directly to the serving layer without any downstream multi-source join overhead.
- S3 Compliance Sink: All CDC events were persisted to an AWS S3 Parquet Sink append-only table, retaining the complete row-level change history across all source systems in an immutable, time-travel-queryable format. Regulatory audit requests can now be satisfied against a fully governed, point-in-time queryable dataset.
- ClickHouse and PostgreSQL Serving Layer: Kafka Connect sink connectors push enriched events to ClickHouse for sub-second OLAP dashboard queries and to PostgreSQL for operational API reads, two purpose-fit serving stores each updated within seconds of the originating source change with no batch dependency.
Technology Stack
| Category | Technology |
|---|---|
| CDC Integration | Debezium |
| Event Backbone | Apache Kafka, Kafka Connect (AWS MSK) |
| Stream Processing | Kafka Streams |
| Compliance Storage | AWS S3 (Parquet Sink) |
| Serving Layer | ClickHouse, PostgreSQL |
- Data Latency Cut from 12 to 18 Hours to Under 10 Seconds: Apache Kafka CDC pipeline delivers row-level changes to ClickHouse and PostgreSQL serving layers within 10 seconds of the originating transaction, enabling real-time intra-day monitoring across all KPI domains for the first time.
- Production Database Query Load Eliminated: Debezium reads exclusively from transaction logs. Zero analytical queries reach any production system, and production query response times improved by over 40% during peak transaction windows by removing the analytical read competition entirely.
- Schema Breaks Eliminated Across All Downstream Consumers: Schema Registry backward-compatibility enforcement surfaces every schema change as an immediate, actionable failure at publish time. Zero silent downstream corruption events have been recorded since deployment.
- Full Regulatory Audit Trail Established: AWS S3 Parquet Sink retains 100% of row-level change events across all source systems with full time-travel query support. Regulatory audit requests that previously required days of manual effort are now satisfied in minutes against a governed, immutable dataset.
- Pipeline Recovery Time from Hours to Under 15 Minutes: Dead-letter queue captures all failed events automatically. Kafka Connect enables targeted reprocessing of only the failed records, reducing mean time to recovery from a 3 to 4 hour manual restart process to under 15 minutes with full event-level visibility throughout.
Ksolves delivers Apache Kafka CDC pipeline implementation and Big Data consulting services for banks and financial institutions that need to replace batch ETL latency with real-time data propagation across core banking systems.
Before this engagement, every downstream consumer at the bank was operating on data that was half a day old, production systems were absorbing analytical query load, and there was no schema governance or point-in-time audit capability anywhere in the architecture. After Ksolves delivered the Kafka CDC pipeline, data latency fell to under 10 seconds, production query load was eliminated, and an immutable row-level change audit trail was established on AWS S3 for the first time.
The CDC backbone is architected for incremental extension. Additional source systems, new Kafka Connect sink destinations, and streaming ML feature pipelines for real-time fraud detection can all attach to the same event stream without redesigning the core architecture.
Is Your Bank Still Running Nightly Batch Jobs to Feed Analytics?