Project Name

How Ksolves Automated 80% of Telecom Data Reconciliation Across 4 Countries with Apache NiFi & ClickHouse

How Ksolves Automated 80% of Telecom Data Reconciliation Across 4 Countries with Apache NiFi & ClickHouse
Industry
Telecommunication
Technology
Apache NiFi, Spark, ClickHouse, MapR-FS, MinIO, Airflow, Kubernetes, Keycloak, Apache Atlas, Grafana, Prometheus

Loading

How Ksolves Automated 80% of Telecom Data Reconciliation Across 4 Countries with Apache NiFi & ClickHouse
Overview

A multi-country MVNO operator across Morocco, Benin, Mali, and the Central African Republic was manually reconciling millions of daily roaming and revenue transactions – with Sage X3 ERP feeds arriving out of sequence, financial reports lagging by days, and no real-time analytics layer to catch discrepancies before regulatory windows closed. Ksolves delivered a production-grade Big Data platform built on Apache NiFi and ClickHouse, structured around a Bronze-Silver-Gold Medallion Architecture, that eliminated manual intervention from ingestion to reconciliation to executive dashboards – now operating live across all four countries.

Key Challenges
  • Manual Roaming Reconciliation: ERP feeds from Sage X3 and TBG financial reports were reconciled manually across multiple country zones, creating a multi-day lag in revenue assurance and a persistent risk of undetected discrepancies before regulatory reporting windows closed.
  • No Real-Time Analytics Layer: Business units across Morocco, Benin, Mali, and the Central African Republic had no unified analytics platform to monitor operational KPIs, network performance, or revenue metrics in real time, leaving leadership dependent on delayed batch reports and unable to act on discrepancies proactively.
  • Fragmented Data Architecture: Data arrived from heterogeneous sources, NiFi flows, Kafka topics, PySpark jobs, and ERP exports with no governed ingestion layer or lineage tracking, making root-cause analysis on data errors extremely time-consuming and unreliable.
  • Scattered Infrastructure and Performance Bottlenecks: Operational clusters were dispersed, lacked centralised monitoring, and exhibited slow task execution due to resource contention, with no observability framework to identify or resolve performance degradation proactively before it caused pipeline failures.
  • Security and Access Control Gaps: Data visualisation and analytics access layers had no fine-grained RBAC enforcement, creating compliance exposure across multi-country data boundaries where regulatory data residency rules applied.
  • Centralised Storage Absent: Each operational node relied on local filesystem storage, making cross-zone analytics impossible without costly data movement and introducing consistency failures during high-volume reconciliation windows.
Our Solution

Ksolves delivered an automation-first Big Data platform purpose-built for telecom revenue assurance, with observability and security controls enforced across every layer from ingestion to analytics.

  • Apache NiFi: Automated Data Ingestion Pipeline NiFi was deployed as the primary ingestion engine, handling multi-format data flows from traffic feeds, Oracle databases, and SFTP sources. Automated decoding, cleaning, and routing to MapR-FS eliminated manual data staging that had previously consumed days of operational effort.
  • Medallion Architecture (Bronze to Silver to Gold): A structured four-stage transformation pipeline was implemented using Apache Spark: Landing Zone (Bronze from MinIO), Data Decoding and Consolidation (Silver S1), Data Enrichment (Silver S2), and Data Finalization (Gold). This ensured clean, governed data reached ClickHouse and Hive for analytics workloads.
  • ClickHouse + MapR-FS + MinIO - Multi-Tier: Storage Stage-appropriate storage was deployed across the pipeline: MinIO for landing and intermediate storage, MapR-FS as the distributed processing layer, Hive for structured warehouse queries, and ClickHouse as the high-performance OLAP engine for real-time roaming and revenue analytics.
  • Apache Airflow + Kubernetes - Orchestration and Automation: Airflow managed all pipeline scheduling and dependency resolution across Spark jobs and NiFi flows, achieving 80% task automation. Kubernetes provides container orchestration for all services, ensuring consistent deployment across multi-zone environments.
  • Grafana + Prometheus + Elastic Stack Observability: A complete monitoring stack was deployed to provide real-time visibility into pipeline throughput, resource contention, and system health, replacing the previously reactive and slow approach to bottleneck resolution.
  • Keycloak + Apache Atlas: Security and Governance Keycloak enforced mandatory RBAC across all data access and visualisation layers. Apache Atlas provided data cataloguing and lineage tracking, ensuring full auditability of data flows across all operating zones.

Technology Stack

Layer Technology / Tool Purpose
Ingestion Apache NiFi Automated multi-format data ingestion and routing to MapR-FS
Processing Apache Spark + Airflow Medallion transformations with 80% pipeline task automation
OLAP Engine ClickHouse Sub-second queries on millions of daily roaming and revenue transactions
Storage MapR-FS + MinIO + Hive Stage-matched storage across landing, processing, and warehouse layers
Infrastructure Kubernetes + Ansible + GitLab CI/CD Scalable deployment and automated provisioning across all zones
Observability Grafana + Prometheus + Elastic Stack Real-time pipeline visibility and log aggregation across all stages
Security Keycloak + Apache Atlas RBAC enforcement and data lineage across all four countries
Analytics Superset + Dremio + SAS Viya Role-specific dashboards for revenue, operations, and executive reporting
Impact
  • 80% Pipeline Task Automation: Reconciliation and ingestion across all four countries now run fully automated via NiFi and Airflow, with near-zero manual intervention.
  • Real-Time Revenue Assurance: ClickHouse-powered analytics replaced multi-day reconciliation lags, enabling same-day discrepancy detection across all operating zones.
  • 60%+ Reduction in Cross-Zone Data Movement: A single governed storage tier via MinIO and MapR-FS replaced fragmented local filesystems across all operational nodes.
  • Mean Time to Detect Reduced from Days to Minutes: Grafana, Prometheus, and Elastic Stack replaced reactive bottleneck resolution with continuous pipeline visibility.
  • RBAC and Data Lineage Enforced: Keycloak and Apache Atlas closed compliance gaps with active access control and full audit trails across all four countries.
  • AI/ML-Ready Foundation: The Medallion Architecture enables direct evolution toward churn prediction, fraud detection, and network optimisation without architectural rework.
Solution Architecture
stream-dfd
CLIENT TESTIMONIAL

“For the first time, our revenue assurance teams across all operating zones are working from a single, real-time data layer. The manual reconciliation burden that once consumed days of operational effort has been eliminated.”

Head of Revenue Assurance / CTO.

Conclusion

Ksolves delivered a purpose-built Big Data platform where governed ingestion, real-time analytics, and full observability work together within a single unified architecture, replacing manual reconciliation across four countries entirely.

 

With 80% of pipeline tasks automated and revenue assurance running in real time, the operator is now positioned to meet regulatory requirements, expand analytics, and build AI/ML-powered products on the same production data foundation.

Automate Your Telecom Reconciliation and Revenue Assurance with Ksolves.