Project Name

Custom Apache NiFi Processors for Telecom CDR Binary Decoding and XLSB Parsing

How Ksolves Cut Telecom CDR Processing Time by 70% with Custom Apache NiFi Processors
Industry
Telecommunication
Technology
Apache NiFi, Java, Python, Maven

Loading

How Ksolves Cut Telecom CDR Processing Time by 70% with Custom Apache NiFi Processors
Overview

For a large telecom operator, millions of CDR files are generated every day, and every record needs to be decoded, parsed, and loaded into analytics systems within a narrow processing window. When that processing depends on standalone scripts that crash under load and offer no visibility into failures, the entire analytics operation stalls. Reports are delayed, anomalies go undetected, and engineering teams spend their time firefighting instead of building. Ksolves was brought in to replace the fragile script-based pipeline with a robust, scalable NiFi-native architecture that could handle both proprietary CDR binary formats and complex .xlsb Excel files.

 

The client is a B2B telecom operator serving enterprise and carrier customers across South Asia, managing network events, billing cycles, and operational reporting for over 15 million active subscribers. Their infrastructure generates CDR files continuously across multiple network nodes alongside periodic reports in binary .xlsb format, creating a dual-format processing challenge that existing tooling could not handle at scale.

The Challenge

The client's legacy CDR and .xlsb processing system created three interconnected operational problems that were worsening as data volumes grew:

  • No Viable Path to NiFi Integration: Apache NiFi had no native processor for proprietary CDR binary formats or .xlsb Excel files. The only option was standalone scripts running outside NiFi entirely, breaking the pipeline's monitoring, provenance, and error-handling capabilities.
  • Scripts That Couldn't Scale or Be Maintained: The legacy scripts had no parallelism, no backpressure support, and no structured logging. As CDR volumes grew to millions of records per day, processing stretched to several hours per batch, creating downstream delays in billing analytics and network reporting. Any failure required a full manual restart with no ability to resume.
  • Zero Operational Visibility: When processing failed, there was no alerting, no audit trail, and no way to identify which records had been dropped. Recovery required manual intervention and re-processing, adding hours of engineering effort per incident.
The Solution

To overcome the limitations of the legacy system, Ksolves designed a two-tiered solution: a custom NiFi processor built in Java with Maven for CDR binary decoding, and a Python-based scripting processor for .xlsb Excel file parsing, both fully integrated into the NiFi pipeline with native provenance and monitoring support.

  • Custom CDR Decoder Processor (Java + Maven): The custom CDR decoder was developed in Java using Maven as the build tool. It implements logic to decode proprietary binary CDR formats and is built as a NiFi processor by extending the AbstractProcessor class from the NiFi Java API. The processor supports configurable input/output formats and structured logging, and efficiently handles large files through parallel processing and backpressure support, capabilities that the legacy scripts entirely lacked.
  • Python-Based XLSB Parser (NiFi Scripting Processor): The Python-based custom processor was developed using NiFi's scripting capability. It uses the pyxlsb and pandas libraries to read and extract data from binary .xlsb Excel files, with Python's struct module handling low-level binary decoding. The logic is encapsulated in a modular script that accepts NiFi FlowFiles, processes and structures the data, and outputs it in CSV format for downstream NiFi processors.
  • Intelligent File Routing and Deployment: Both custom processors were packaged and deployed into the NiFi cluster environment with configurable parameters, including file paths, format types, and processing options. The dataflow was designed to automatically detect incoming file type (CDR or .xlsb), route data to the appropriate processor, and handle each format with its dedicated decoding logic, eliminating manual sorting and pre-processing steps.
  • Full Observability via NiFi Provenance: The entire pipeline was integrated with NiFi Provenance tracking, NiFi Counters, and NiFi Monitoring Tools to provide real-time flow visibility, complete record-level traceability, and automated error alerting, replacing the previous zero-visibility model where failures were only discovered when downstream reports were missing data.

Technology Stack

Category Technology / Details
Core Platform Apache NiFi (cluster environment)
CDR Processor Language Java
Build Tool Maven
NiFi Extension API AbstractProcessor (NiFi Java API)
XLSB Processor Language Python (NiFi Scripting Processor)
Python Libraries pyxlsb, pandas
Binary Parsing Python struct module
Observability NiFi Provenance, NiFi Counters, NiFi Monitoring Tools
Output Format CSV (for downstream NiFi processors)
File Types Handled Proprietary CDR binary format, .xlsb (Binary Excel)
Deployment NiFi cluster with configurable processor parameters
Results / Impact
  • 70% Reduction in CDR Processing Time: CDR decoding time reduced by approximately 70%, dropping from multi-hour batch windows to near-real-time processing per cycle, enabling same-day analytics on network events that previously required overnight runs to complete.
  • Millions of Records Processed in Parallel Without Degradation: The new pipeline processes millions of CDR records and hundreds of .xlsb files in parallel per cycle, handling peak data volumes consistently without throughput degradation, compared to the sequential script-based approach that failed under load and required manual recovery.
  • 40% Reduction in Operational Engineering Overhead: Elimination of manual script execution, failure recovery, and ad-hoc re-processing saved an estimated 40% of the data engineering team's weekly operational effort, freeing the team to focus on pipeline development and analytics work rather than incident response.
  • 100% Record-Level Traceability via NiFi Provenance: NiFi Provenance tracking provides a complete audit trail for every record processed, replacing the previous zero-visibility model and enabling the analytics team to instantly identify which records were processed, when, and through which pipeline path.
  • New Data Format Onboarding Reduced from Weeks to Days: The modular Java and Python processor architecture reduces the time to integrate a new data format from weeks of custom scripting to days of processor configuration, a structural improvement that has already been applied post-launch to onboard additional data source types as the client's network has expanded.
DFD
stream-dfd
Client Testimonial

“Before this project, our CDR processing was effectively a black box. We knew files went in and data came out eventually, but we had no visibility into what was happening in between, and failures were only discovered hours later when reports were wrong. The NiFi-based solution Ksolves built gave us full traceability, parallel processing that actually scales with our volumes, and a pipeline our team can maintain without specialized script knowledge. Processing time dropped dramatically, and our engineers stopped spending their days firefighting.”

— Head of Data Engineering, Major Telecom Operator (name withheld by request)

Conclusion

By implementing custom Apache NiFi processors in Java and Python, Ksolves transformed the client’s fragile, script-based CDR pipeline into a robust, observable, and fully scalable data processing architecture. What once required hours of sequential processing, with no visibility and frequent manual recovery, now runs in parallel with full NiFi Provenance tracking, processing millions of records per cycle without operational burden. As the client’s network continues to expand and new data formats emerge, the modular processor architecture means onboarding new sources is a configuration task, not a development sprint. As an AI-First Company and a trusted Apache NiFi Development Company, Ksolves brings AI-driven pipeline analysis and deep Apache NiFi engineering expertise to every engagement. For telecom operators and data-intensive enterprises managing complex binary formats at scale, our Apache NiFi and Big Data Engineering practice delivers the pipeline reliability and processing throughput that modern analytics demands.

Explore Our Apache NiFi and Big Data Engineering Services