Project Name
Build a Secure Edge-to-Hub Data Lakehouse for Multi-Operator Telecom Analytics
![]()
A leading telecommunications data services provider implemented a greenfield Edge-to-Hub Data Lakehouse architecture to support distributed CDR analytics and sovereign data management. The platform was designed to ingest, process, and analyze 40TB of Call Detail Records (CDRs) generated across multiple geographically distributed mobile sites and consolidate them into a highly available central analytics hub.
The solution supports four or more Mobile Network Operators (MNOs) on a shared infrastructure while enforcing strict data isolation, security, and regulatory compliance.
The client approached us to resolve the key challenges that include:-
- Secure Edge Data Ingestion: Mobile sites required a secure and auditable mechanism for MNOs to upload raw CDR files without exposing internal file systems or network directories.
- Configuration Consistency Across Distributed Sites: With four or more geographically remote mobile sites, maintaining identical data flow logic across all locations was a major operational challenge. Any configuration change had to be deployed consistently, quickly, and safely, without introducing drift or risking data processing failures at individual sites.
- High-Volume Central Data Consolidation: The architecture needed to reliably aggregate high-velocity CDR data streams from multiple geographies into a single 40TB central repository without impacting ingestion performance.
- Strict Multi-Tenancy & Data Privacy: Ensuring end-to-end data isolation for four or more MNOs at the Central Site while operating through a single, unified management and governance plane.
- Edge Tier (Mobile Sites): Each mobile site was equipped with SFTPGo to provide a hardened, secure gateway for raw CDR file ingestion. Apache NiFi handled validation, enrichment, and formatting at the edge, ensuring data quality before transmission. NiFi Registry enabled centralized version control, allowing configuration updates to be pushed simultaneously to all mobile sites with full rollback support.
- Central Ingestion Hub: Data is securely pushed from Mobile Sites to a Central Kafka Cluster. This decouples the edge collection from the heavy processing at the core.
- Scalable Storage & Processing: A 40TB MinIO central storage layer served as the central Lakehouse foundation. Apache Spark on YARN High Availability processed Kafka streams into Apache Hudi tables, ensuring ACID compliance, incremental processing, and efficient CDR lifecycle management.
- Unified Security Layer: Keycloak provided centralized identity management and OIDC-based SSO across the entire platform. All web interfaces, including Airflow and Superset, were deployed behind Nginx and HAProxy to ensure high availability and load balancing.
- Analytics & Data Consumption: Trino enabled fast, federated SQL queries over Hudi tables stored in MinIO. Apache Superset delivered multi-tenant dashboards, ensuring each MNO accessed only their own isolated datasets through role-based and row-level security controls.
- Secure Data Handover: SFTPGo provides a hardened, audit-ready gateway for raw file ingestion at each site.
- Operational Agility: NiFi Registry ensures that flow changes are deployed across all MNO sites in seconds with full rollback capabilities.
- Enterprise-Grade Resilience: The central site’s YARN HA and HAProxy setup guarantees that 40TB of data is always available for processing and querying.
- MNO Data Privacy: Keycloak and Trino row-level security ensure that operators only see their specific site data, even though it is stored centrally.
This project successfully delivers a modern Edge-to-Hub Data Lakehouse architecture purpose-built for the telecommunications sector. By securing data ingestion at mobile sites using SFTPGo and Apache NiFi, and centralizing 40TB of scalable storage and high-performance processing with MinIO and Apache Spark, the platform strikes a strong balance between edge-level security and centralized analytical capability.
The integration of Keycloak for unified identity management and HAProxy for high availability ensures the solution is fully production-ready, secure, and resilient from day one. Designed to support four or more MNOs on a shared yet isolated platform, the architecture enables reliable CDR analytics while meeting strict data privacy, sovereignty, and operational requirements.
Modernize your Telecom CDR analytics with a Secure Edge-to-Hub Lakehouse.