Build a Scalable Telecom Lakehouse for Secure Multi-Tenant Data

Big Data

5 MIN READ

March 3, 2026

Loading

build a scalable telecom lakehouse for secure multi-tenant data

Building data infrastructure for telecommunications comes with a unique kind of pressure. The data flow is continuously generated without any pause, regulatory requirements leave little room for error, and even a small or unnoticed failure can quickly trigger serious downstream impacts.

In the same scenario, a leading telecommunications data services provider came to us with a problem. They were dealing with 40 terabytes of Call Detail Records (CDRs) generated every day across multiple geographically scattered mobile sites. Four or more mobile network operators sharing the same infrastructure. Strict regulatory requirements. Zero tolerance for data leakage between tenants. They didn’t need a patch; they needed a ground-up rethink of how data moves from the edge of a mobile network to a centralized analytics hub. So that’s exactly what we built.

Key Challenge Faced by the Telecom Industry

Before we start working on the project, we first need to understand the key challenges. In the early stages, we have identified four core challenges that require careful attention.

  • Secure data ingestion at remote sites

Mobile locations needed a reliable way for MNOs to upload raw CDR files. But the usual options raised red flags, and shared folders can have security risks. The exposed file systems are even worse. What the client really needed was a method that was

  1. Secure and access-controlled
  2. Fully auditable
  3. Simple enough for field operators to use without technical complexity

Strong security was important, but it also had to be easy to use, because if a system is complicated, people won’t use it the right way, and that creates new risks.

  • Configuration consistency across distributed locations

With multiple mobile sites operating independently, keeping configurations aligned was a real challenge. Even small changes at one location can go unnoticed and lead to the generation of incorrect data. By the time the issue is discovered, the damage has often already spread across reports and downstream systems. The need wasn’t just deployment; it was ensuring standardization, version control, and consistent updates so every site stayed in sync.

  • High-volume centralized consolidation

Data from multiple regions had to be brought together into a single 40TB central repository. With high-velocity CDR streams coming from different geographies, this wasn’t just a network problem; it required careful architectural planning. The solution needed to:

  1. Avoid network and processing bottlenecks
  2. Ensure reliable, loss-free data transfers
  3. Support consistent, high-throughput ingestion at scale
  4. Without the right design, performance issues or data loss would quickly become a risk.
  • True multi-tenancy with strict isolation

One of the most critical requirements was supporting multiple operators on shared infrastructure while keeping their data completely separate. Each tenant had strict contractual and regulatory expectations around data privacy and isolation. Any data leakage or misconfiguration wouldn’t just be a technical mistake; it could lead to serious legal and business risks. The system, therefore, needed to ensure clear data separation, controlled access, and strong tenant-level governance across every layer.

Modernize your telecom data engineering today.

Architecture Designed by Our Team

Instead of forcing existing tools to fit the problem, we designed a layered architecture where each component has a clear role and hands off cleanly to the next. The goal was simple: make the system secure, scalable, and easy to operate, without unnecessary complexity. Our solution includes:

  • The Edge Tier: Where Data Enters the System

At each mobile site, raw CDR files are uploaded through SFTP Go, which provides a secure and hardened SFTP gateway and acts like a controlled entry point. MNOs can upload their files, but they don’t get access to anything inside the environment. Every action is logged and auditable. Once the files arrive, Apache NiFi takes over. It validates file integrity, performs any required enrichment, and standardizes the format before the data leaves the site.This step is about catching errors at the edge, preventing bad data from traveling across networks and polluting central storage. To keep every site aligned, we use NiFi Registry as a centralized source of truth. Flow changes are version-controlled and pushed across locations simultaneously. If anything goes wrong, we can roll back instantly. This eliminated configuration drift and significantly reduced operational risk.

  • The Central Hub: Where Data Is Collected and Processed

Data from all sites is streamed securely into a central Apache Kafka cluster. Kafka acts as a buffer between ingestion and processing, ensuring that sudden spikes from one location don’t overwhelm downstream systems. Ingestion and processing remain decoupled, which keeps the platform stable under load. For storage, we deployed a 40 TB MinIO environment as the Lakehouse foundation. Processing is handled by Apache Spark running in high availability mode, which consumes Kafka streams and writes them into Apache Hudi tables. Hudi brings transactional reliability to object storage that enable ACID operations, incremental processing, and efficient lifecycle management for CDR data without custom engineering.

  • Security: Built In, Not Added Later

Security was treated as a core design principle from day one. Keycloak manages centralized authentication using OIDC-based single sign-on across platform services. Every user and service request passes through a unified identity layer. All web-facing components are protected behind Nginx and HAProxy. This setup ensures load balancing, high availability, and uninterrupted operations, even if individual nodes fail. For a system managing tens of terabytes of live operational data, resilience isn’t optional.

  • Analytics: Where the Value Becomes Visible

For data access and analysis, Trino runs fast, federated SQL queries directly on the Hudi tables stored in MinIO. There’s no need to move data into separate marts; users query the Lakehouse directly. Visualization is delivered through Apache Superset. Each MNO gets its own isolated workspace, enforced through role-based access controls and Trino’s row-level security. Even though all data resides in shared infrastructure, each operator can only see their own records. Isolation is enforced at the query engine level, not just the dashboard, ensuring true multi-tenant security.

Ksolves Expertise

This architecture reflects the kind of real-world expertise Ksolves brings to complex data environments. Our team specializes in designing and implementing secure, scalable data lakehouse and streaming solutions tailored to industries where performance, compliance, and reliability are critical.

If you’re planning a telecom data platform, modernizing your data infrastructure, or looking for a trusted partner to build an edge-to-hub architecture, Ksolves can help you design, implement, and operate it with confidence.

Conclusion

Telecom environments generate data at a speed and scale that quickly challenge traditional systems. What made this platform successful wasn’t a single technology. We have designed a clear architecture where each layer has a defined role and works efficiently with the next.

Today, multiple MNOs can run on the same platform with confidence; their data stays secure, analytics remain fast, and updates can be rolled out across all sites smoothly.

That’s what a well-designed data lakehouse should deliver: reliability, simplicity, and peace of mind at scale.

loading

AUTHOR

author image
Anil Kushwaha

Big Data

Anil Kushwaha, Technology Head at Ksolves, is an expert in Big Data. With over 11 years at Ksolves, he has been pivotal in driving innovative, high-volume data solutions with technologies like Nifi, Cassandra, Spark, Hadoop, etc. Passionate about advancing tech, he ensures smooth data warehousing for client success through tailored, cutting-edge strategies.

Leave a Comment

Your email address will not be published. Required fields are marked *

(Text Character Limit 350)