Delivered a 50K-TPS Decision Platform for 100M+ Users with Sub-50ms Latency

Industry

Financial Services

Technology

Apache Kafka, Apache Flink, Apache Spark, XGBoost, TensorFlow, Redis Cluster, Apache Cassandra, Kubernetes, and API Gateway

Overview

The client is a global-scale digital enterprise operating across financial services, e-commerce, and consumer technology verticals, serving over 100 million registered users across multiple geographies and time zones.

Their business model depends on the ability to make thousands of real-time decisions per second: credit eligibility assessments, personalised product recommendations, fraud risk scoring, and dynamic pricing, all of which must be resolved before the user’s session completes its next action.

The organisation had scaled its user base aggressively but had not re-platformed its decision infrastructure to match, creating a compounding bottleneck at the very layer where revenue is won or lost. The requirement was unambiguous: a platform sustaining 50,000 transactions per second, serving 100 million user profiles, responding within 50 milliseconds at p99, and producing an auditable, explainable output for every single decision.

Key Challenges

A user base that had scaled to 100 million. A decision infrastructure that had not.

Batch-Only Scoring Creating Revenue Lag: The ML scoring pipeline operated on nightly batch cycles, meaning decisions relied on data up to 24 hours old and missed real-time behavioural signals when they mattered most.
Monolithic Rules Engine Slowing Deployment Velocity: Eligibility, compliance, and pricing rules were hardcoded into a monolithic service, turning simple rule updates into 2-5 day deployment cycles with high regression risk.
Feature Pipeline Bottleneck Under Peak Load: During flash sales and high-traffic events, the feature computation layer saturated below the 50K TPS target, causing inference latency spikes above 500ms and repeated SLA breaches.
Database Architecture Not Built for 100M-Profile Scale: User profiles and decision histories were stored in a relational OLTP database that struggled at scale, with hot profile reads often exceeding 200ms latency.
Zero Decision Explainability for Compliance: Credit, eligibility, and pricing decisions were generated by black-box models without attribution capabilities, creating significant regulatory and audit exposure.
Manual Capacity Management at Every Peak: Every traffic surge required manual provisioning, scaling, and teardown, consuming 30-40 hours of engineering effort per event.

Our Solution

Ksolves, an AI-first Big Data consulting company, built the platform around one core principle: decisions must be driven by live signals, not stale historical data. Every layer, from streaming feature engineering to hot-reload rules and Redis-backed serving, was designed to minimise the gap between user action and decision response. The result is a unified sub-50ms decisioning platform operating reliably at 50K TPS scale with fully automated Kubernetes-based scaling.

Apache Kafka Event Bus: Kafka was deployed as the ingestion backbone handling 50K+ TPS across 100M+ user sessions, with Schema Registry enforcing data contracts and exactly-once semantics preventing duplicate or lost events.
Flink and Spark Streaming Feature Engine: The batch feature pipeline was replaced with real-time stream processing, enabling complex feature computation and stateful aggregations in under 5ms.
Multi-Model ML Decision Engine: XGBoost and TensorFlow models were deployed in a multi-model serving framework with A/B testing, shadow deployments, and inline SHAP explainability for every decision.
Hot-Reload Rules Engine: Eligibility, compliance, and pricing rules were decoupled from deployments and loaded dynamically at runtime, reducing rule update cycles to under 15 minutes.
Redis Cluster and Cassandra Serving Layer: Redis delivered sub-millisecond access for hot user contexts while Cassandra handled 100M+ decision histories with scalable, schema-flexible storage.
Kubernetes Auto-Scaling Orchestration: Kubernetes automatically scaled streaming, inference, and API services based on Kafka lag, CPU, and TPS metrics, sustaining 50K TPS even during 3× traffic spikes.

Technology Stack

Category	Technology
Messaging	Apache Kafka
Stream Processing	Apache Flink / Spark
AI/ML	XGBoost / TensorFlow
Caching	Redis Cluster
Database	Apache Cassandra
Infrastructure	Kubernetes
API Layer	API Gateway (custom)

Impact

From a nightly batch pipeline and monolithic rules engine to a live, explainable, auto-scaling decision platform, engineered for 100 million users and 50,000 decisions per second.

Decision Latency Reduced From 500ms to Sub-50ms at P99: Streaming feature computation and Redis-backed serving reduced decision latency to under 50ms at p99 across sustained 50K TPS workloads.
Feature Freshness Improved From 24-Hour Batch to Sub-5ms Live: Flink-based streaming generates real-time feature vectors in under 5ms, eliminating stale-signal recommendations and risk scoring.
Rules Deployment Cycle Reduced From Days to Under 15 Minutes: The hot-reload rules engine enables compliance and product teams to update and activate rules in under 15 minutes without redeployment.
Zero Manual Capacity Events, Engineering Hours Fully Recovered: Kubernetes HPA automatically absorbs 3× traffic spikes, eliminating manual provisioning and scaling overhead during peak events.
100% Decision Explainability Achieved for Regulatory Compliance: SHAP explainability is computed in-line for every request and stored instantly for audit and compliance visibility.

Solution Architecture

Conclusion

A decision platform running on 24-hour-old features, a rules engine that took days to update, and a database that buckled under moderate load is not a scaling problem; it is a revenue problem, a compliance problem, and a competitive problem compounding simultaneously. This organisation had all three. Ksolves resolved them in a single platform delivery. Decisions that took 500ms now complete in under 50ms. Features that were 24 hours stale are now computed in under 5ms from live signals. Rules that took days to deploy now go live in minutes. And every decision across 50,000 transactions per second carries a full SHAP attribution trail that the compliance team can access in seconds. The platform is built to scale linearly: Kubernetes absorbs the spikes, Cassandra absorbs the history, and the architecture absorbs 10× user fleet growth without a redesign. The organisation is no longer constrained by its decision infrastructure; it is defined by it.

Have A Project Idea?

Name*

Email*

Phone Number*

Message*

What is 10 + 3 ? *