Project Name

Migrated Petabytes of CDR and Compliance Data from MapR to ClickHouse with Zero Data Loss for a North African Telco

Migrated Petabytes of CDR and Compliance Data from MapR to ClickHouse with Zero Data Loss for a North African Telco
Industry
Telecommunication
Technology
Apache Spark, ClickHouse, Apache NiFi, MapR, Python, Kubernetes

Loading

Migrated Petabytes of CDR and Compliance Data from MapR to ClickHouse with Zero Data Loss for a North African Telco
Overview

Eight years of call records, compliance submissions, subscriber histories, and regulatory audit data were sitting on an outdated storage platform that the vendor had stopped supporting. For a large North African telecom operator, the question was not whether to move this data but how to do it without losing a single record, disrupting live call processing, and without delivering a new system that regulators and analysts could not trust.

 

Ksolves planned and executed a large-scale data migration from MapR to ClickHouse, with automated validation at every stage to ensure every record was migrated accurately. This delivered a fully verified data warehouse that the business could trust and use from day one.

Key Challenges

The client came to Ksolves with six problems that made this one of the most difficult data migrations the team had handled:

  • Moving Petabytes of Data Without Stopping Operations: The old system held years of call records and compliance data. The move had to happen in stages, during off-peak hours, without pausing live call record processing. New data kept arriving throughout the migration, which made timing and coordination critical.
  • No Vendor Support Available: The old MapR platform had been discontinued with no vendor support remaining. Every technical problem encountered during the data extraction had to be solved by the Ksolves team using their own expertise, with no help from the original vendor.
  • Data Formats Changed Many Times Over Eight Years: Call record data formats had changed repeatedly across network upgrades and system changes. Different time periods had different field names, different column structures, and different value formats, all of which needed to be cleaned up and made consistent before moving to the new system.
  • Regulators Required Proof That No Data Was Lost: The national telecom regulator required the operator to prove, with documentation, that every single record had been moved without changes. A signed verification report for each batch of data was mandatory for the audit.
  • The New System Had to Be Much Faster: Moving the data was not enough. The business also needed the new ClickHouse system to answer compliance queries in seconds rather than the 4 to 6 hours those same queries were taking on the old platform. The new system had to be designed correctly from the start to achieve this.
  • Migration Work Could Not Slow Down Live Systems: Call record data was being processed around the clock on the old platform while the migration was happening. Any migration work that used too many system resources would slow down live operations, which was not acceptable.
Our Solution

Ksolves built a careful, step-by-step migration plan based on three rules: move every record without exception, do not affect live systems, and produce verified proof of completion at every stage. The work ran across four phases: planning and new system design, a test migration and verification build, the full migration in batches, and a final check before go-live.

  • Extracting Data from the Old System Safely: Apache Spark pulled call records and compliance data from the old MapR system in manageable chunks, one time period at a time. The extraction was limited to a set share of the old system's resources, so live operations were never affected. If anything went wrong mid-way, the job could pick up exactly where it left off without starting over.
  • Cleaning and Standardizing Eight Years of Data: A data cleaning step converted eight years of inconsistent record formats into one standardized structure aligned with the new ClickHouse system. Every field name change, renamed column, and format variation was carefully mapped and documented in a runbook, which the client’s team reviewed and approved before migration began.
  • Designing ClickHouse to Answer Compliance Queries Fast: The new ClickHouse system was set up specifically for call record workloads. Data was organized by call date, and the most common compliance queries were pre-built as fast-access views. Queries that took over 4 hours on the old system now run in under 8 seconds.
  • Automated Checks After Every Batch: A verification framework built in Python and SQL ran four checks automatically after every batch was moved: total record counts matched between old and new systems, data blocks were compared using checksums to detect any corruption, the data structure matched the expected format, and a random sample of 0.5% of records was compared in detail. Every batch produced a signed verification report that could be shown to regulators.
  • Full Record of Every Step via Apache NiFi: Apache NiFi managed the entire migration pipeline and logged the details of every batch, including what was moved, when, how many records, which checks passed, and where the data landed. This complete log was available as a formal migration certificate for the regulatory authority.

Technology Stack

Category Technology Role
Source Platform MapR Legacy distributed storage holding petabytes of CDR and compliance data
Target Platform ClickHouse Columnar data warehouse delivering sub-second query performance on call records
Migration Engine Apache Spark Batch data extraction from MapR to ClickHouse in parallel, resumable chunks
Orchestration Apache NiFi Pipeline management with full step-by-step logging of every migrated batch
Validation Python and SQL Four automated checks running after every batch to confirm data completeness
Infrastructure Kubernetes Dynamic compute scaling to maximize throughput during off-peak migration windows
AI Tooling AI-assisted mapping Data format analysis and field variation detection across eight years of CDR schemas
Impact

The migration delivered confirmed results across data completeness, query speed, regulatory compliance, and infrastructure cost:

  • 100% of Records Moved with Zero Data Loss: Every call record and compliance file was moved to ClickHouse with zero data loss, confirmed by all four verification checks across every batch, including record count matching, data integrity checks, format validation, and deep record comparison.
  • Compliance Queries Cut from Hours to Under 8 Seconds: Queries that used to take 4 to 6 hours on the old system now complete in under 8 seconds on ClickHouse, a reduction of more than 99%. The compliance team can now respond to regulator data requests the same day they arrive.
  • Signed Verification Reports Accepted as Regulatory Evidence: The automated verification framework produced a signed report for every batch of data moved. These reports were accepted by the compliance team as formal regulatory proof that all data arrived complete and unchanged.
  • Zero Impact on Live Call Processing: Resource limits and off-peak scheduling kept live call record processing running normally throughout the entire migration, with no slowdowns recorded.
  • Old MapR Infrastructure Fully Shut Down: The discontinued MapR platform was decommissioned after sign-off, removing the ongoing cost and risk of maintaining unsupported hardware and third-party support contracts.
Data Flow Diagram
stream-dfd
Client Testimonial

“We had eight years of data on a platform that was effectively unsupported. Ksolves gave us a migration we could prove to our regulator was complete. Every record accounted for, every batch validated, every step documented.”

– Chief Data Officer, Large North African Telecommunications Operator

Conclusion

Before this project, years of critical call records and compliance data were stuck on an unsupported platform with no clear path forward, slow queries, and a growing risk of data loss. Today, Ksolves, with its AI-first delivery approach, has moved everything to a modern ClickHouse warehouse where every record is verified, every batch is documented, and the same queries that used to take hours now complete in seconds.

 

The verification reports and migration logs now serve as formal regulatory evidence, giving the operator a clear, documented record from the old system to the new one. With clean data on a fast, modern platform, the operator can now build real-time fraud detection, AI-driven customer analytics, and automated compliance reporting on a foundation that works the way the business needs it to.

 

For telecom operators and enterprises with critical data stuck on legacy platforms, explore Ksolves Big Data migration services and ClickHouse consulting services. You can also contact our experts at sales@ksolves.com.

Critical Data Stranded on a Legacy Platform with No Vendor Support? We’re Here to Help!