Project Name
Unified Analytics for 3 Markets: ClickHouse, MinIO & Spark on Bare Metal
![]()
Our client is a large telecommunications group operating across multiple Sub-Saharan and North African markets, providing mobile voice, data, and digital financial services to tens of millions of subscribers. Each regional subsidiary runs its own data infrastructure, creating both an autonomy requirement and a consistency challenge: platforms must be independently operable yet architecturally standardised.
The group had outgrown a legacy MapR-based analytics environment and needed a modern, fully open-source replacement that it could own, extend, and operate without cloud lock-in. The mandate was clear: one reference architecture, deployed across three markets, with full knowledge transfer to local teams and zero ongoing external operational dependency.
Three markets, three different legacy stacks, years of critical data locked in a MapR environment, and no path to a unified architecture without rebuilding everything from scratch in each geography.
- Fragmented, Market-Siloed Analytics: Each regional market operated its own independent data stack with no shared architecture, common data model, or unified query layer, making cross-market reporting and governance reliant on manual reconciliation.
- Legacy Platform With No Migration Path: Years of critical data in the legacy MapR Hive environment had no supported migration path to a modern analytics platform, limiting access to historical records and business insights.
- Bare-Metal Deployment Complexity: Deploying a multi-component open-source stack across physical servers in three geographies required careful infrastructure planning, storage sizing, networking, and dependency management without cloud-managed services.
- No Unified Observability: Platform monitoring was fragmented across components and markets, leaving no centralized visibility into system health, capacity, or pipeline failures and resulting in reactive incident management.
- Knowledge Transfer Gap: The client needed local engineering teams to independently operate, troubleshoot, and extend the platform after deployment, despite limited documentation and no established operational runbooks.
- Object Storage and Compute Integration: Integrating MinIO with Apache Spark and ClickHouse on bare metal required extensive S3 compatibility tuning, credential management, and performance validation across all three deployments.
Ksolves, an AI-first big data consulting services company, designed a single reference architecture and deployed it consistently across all three regional markets. The governing principle was repeatability: every component choice, every configuration parameter, and every runbook was standardised so that a team in any market could operate any other market's platform.
- Standardized Multi-Market Architecture: Ksolves designed a single reference architecture and deployed it consistently across all three regional markets. Every component, configuration, and runbook was standardized, creating a repeatable, open-source, bare-metal platform with no cloud dependency, vendor lock-in, or licensing costs.
- ClickHouse Cluster: A 10-node ClickHouse cluster (5 shards, 2 replicas) was deployed per market with ClickHouse Keeper for coordination. Automated storage tiering across NVMe and HDD using TTL policies delivered high query performance while optimizing long-term data retention costs.
- MinIO Distributed Object Store: A distributed MinIO cluster provided S3-compatible object storage for Apache Spark, ClickHouse external tables, and archival workloads, eliminating cloud storage dependency while significantly reducing storage costs.
- NiFi and Kafka Ingestion Pipeline: A three-node Apache NiFi cluster and Apache Kafka enabled reliable real-time data ingestion, buffering, and replay. Standardized NiFi flow definitions across all markets simplified governance and operational consistency.
- Spark and HDFS Processing Layer: Apache Spark 3.5 with HDFS 3.x powered large-scale data processing, while a SeaTunnel ETL pipeline migrated historical Hive data from the legacy MapR environment into ClickHouse, unlocking years of historical analytics.
- Full-Stack Observability: Prometheus and Grafana provided unified monitoring with a standardized 78-panel dashboard covering ClickHouse, NiFi, Kafka, MinIO, and infrastructure health, enabling proactive issue detection across the platform.
- Knowledge Transfer and Runbooks: Comprehensive documentation, operational runbooks, and hands-on knowledge transfer sessions enabled local engineering teams in each market to independently operate, troubleshoot, and extend the platform after handover.
Technology Stack
| Category | Technology |
|---|---|
| Analytical Database | ClickHouse |
| Object Storage | MinIO |
| Ingestion | Apache NiFi 2.4 + Apache Kafka 4.0 |
| Processing | Apache Spark 3.5 / HDFS / YARN |
| ETL Migration | SeaTunnel |
| Observability | Prometheus + Grafana |
From three analytically isolated markets on different legacy stacks to one reference architecture, one monitoring standard, and one operational playbook across every geography.
- Three Markets Unified Under One Architecture: A single reference architecture with standardized ClickHouse configurations, NiFi flows, and Grafana dashboards was deployed across all three markets, enabling consistent operations and faster expansion into new regions.
- 100% Legacy Data Migrated: SeaTunnel successfully migrated historical Hive data from the legacy MapR environment into ClickHouse, unlocking years of critical analytics while enabling the retirement of the MapR platform and its associated maintenance costs.
- Zero Cloud Dependency: Bare-metal MinIO replaced cloud object storage across all markets, reducing storage costs, ensuring data sovereignty, and eliminating cloud egress and request charges.
- Unified Full-Stack Monitoring: A standardized 78-panel Grafana dashboard provided end-to-end visibility across six core platform components, enabling proactive monitoring and significantly reducing incident detection time.
- Operational Independence Across All Markets: Comprehensive documentation, runbooks, and hands-on knowledge transfer enabled local engineering teams to independently operate, maintain, and extend the platform within 30 days of handover.
By replacing fragmented legacy environments with a standardized, open-source reference architecture, Ksolves enabled the telecommunications group to unify analytics across three regional markets while eliminating cloud dependency and retiring its legacy MapR platform. The solution delivered consistent data ingestion, processing, monitoring, and governance across every geography, backed by comprehensive knowledge transfer for complete operational independence.
With a scalable foundation now in place, the organization can onboard new markets faster, simplify cross-market reporting, and confidently support future analytics, AI, and machine learning initiatives on a single, unified data platform.
Ready to Unify Your Regional Data Platforms on Open-Source, Bare-Metal Infrastructure with Zero Cloud Lock-In?