Project Name
How Ksolves Migrated 20,000 NiFi Processors Across 4 Production Clusters for a Major European Airline With Zero Data Loss
![]()
Four production NiFi clusters. Twenty thousand processors. A version of Apache NiFi that was blocking Kafka upgrades, limiting security improvements, and putting mission-critical data pipelines at growing risk every month they stayed on it.
For a major European airline running its core data flows on NiFi 1.15.3, the upgrade to NiFi 2.6.0 was not optional. But at this scale, there is no room for guesswork. A failed migration across four live clusters would mean data loss across business-critical operations.
Ksolves was brought in to lead the migration. Using an AI-first delivery approach, the team worked through a seven-phase programme that identified every risk, fixed every breaking change in a pre-production environment, and cut over each cluster without losing a single byte of data.
The airline came to Ksolves with six challenges that made this one of the most complex NiFi migrations the team had taken on:
- Four Independent Clusters, Each With Its Own Complexity: The four clusters, known as PYC, OPS, FARM, and DT, each operated independently with different numbers of active flows ranging from 12,350 to 45,000. There was no single upgrade path that worked for all four. Each cluster needed its own assessment and its own sequenced plan.
- An Old Platform Blocking Progress: NiFi 1.15.3 was preventing the airline from upgrading to newer versions of Amazon MSK and Kafka. It also blocked access to NiFi 2.x security improvements that the airline's growing data governance requirements needed. Every quarter they stayed on 1.15.3, the technical debt grew.
- No Disaster Recovery and No Centralised Logs: The production clusters had no DR environment in place. Logs were stored locally on each node with no central management. The only recovery option was EC2 and AMI backups. This made the upgrade window more operationally sensitive than it would have been with proper resilience in place.
- Custom Scripts and Back-Pressure Issues to Fix: Several clusters were running ExecuteScript processors using Groovy and Python. There were also known back-pressure problems related to database connections and excessive transactions. These needed to be resolved as part of the migration, not left for later.
- Pre-Production and Production Were on Different Operating Systems: Pre-production was being moved to Amazon Linux while production remained on Oracle Linux 8.5 and 8.6. This meant the team had to carefully manage potential differences between what was tested in pre-production and what would actually run in production.
- A Load Balancing Bug Making the Primary Node Do Too Much Work: An existing bug was causing the primary cluster node to carry a disproportionate share of the flow load while secondary nodes sat underutilised. This was a known performance problem that had persisted across multiple NiFi 1.x releases and had to be fixed as part of the migration.
Ksolves ran a seven-phase structured migration programme. The governing principle was simple: test everything in pre-production before touching production. Every breaking change was identified, every deprecated processor was fixed, and every flow was validated before a single production node was upgraded.
- Full Discovery Across All Four Clusters: The team started with a complete audit of every cluster using Ksolves' NiFi readiness checklist. Processor counts, process groups, custom scripts, stateful components, integrations, and back-pressure issues were all documented across all four environments. Nothing was assumed.
- Pre-Production Built and Upgraded to NiFi 1.28.x First: A stable pre-production environment was built on Amazon Linux and upgraded to the latest 1.x release before moving to 2.x. This intermediate step followed the recommended upgrade path and gave the team a solid baseline to test against.
- Full NiFi 2.6.0 Migration in Pre-Production: All flows were migrated to NiFi 2.6.0 in the pre-production environment. Variable-based configurations were converted to Parameter Contexts. Deprecated and relocated processors were updated. Custom NAR compatibility was validated. Kerberos and SSL changes were addressed. Every breaking change identified in the gap analysis was resolved before production was touched.
- Gap Analysis Report Signed Off Before Go-Live: A detailed Gap Analysis Report was produced covering all four clusters. Every issue found, every fix applied, and every remaining risk was documented. The client's team reviewed and signed off on this report as a production readiness certification before any production upgrade began.
- Rolling In-Place Cut-Over Across All Four Clusters: Production upgrades were executed one node at a time with data ingestion paused by agreement with each business unit. This rolling approach kept downtime to the absolute minimum while protecting data integrity throughout the entire cut-over process.
- Load Balancing Bug Fixed and Back-Pressure Resolved: The flow file load balancing bug was identified and resolved as part of the 2.6.0 migration, achieving even distribution across all cluster nodes. All known back-pressure issues from the discovery phase were also addressed, leaving the clusters in a better state than when the migration began.
Technology Stack
| Category | Technology | Role |
|---|---|---|
| Platform | Apache NiFi 2.6.0 | Target version across all four production clusters with Java 21 and Parameter Contexts |
| Flow Control | NiFi Registry 2.x | Flow versioning and deployment management upgraded in parallel with all cluster nodes |
| Infrastructure | AWS EC2, Oracle Linux, Amazon Linux | Production on Oracle Linux; pre-production on Amazon Linux for upgrade validation |
| Integration | Amazon MSK | Kafka integration layer unblocked by the NiFi 2.6.0 upgrade |
| Monitoring | Zabbix, Prometheus, Grafana | Monitoring stack validated and extended for NiFi 2.x metric compatibility |
| Coordination | Apache ZooKeeper 3.8.0 | External cluster coordination layer validated for NiFi 2.x compatibility |
The seven-phase migration delivered confirmed results across all four clusters:
- 20,000 Processors Migrated With Zero Data Loss: The full NiFi 2.6.0 migration was completed across all four clusters with zero data loss. Every breaking change was remediated, and every flow was validated in pre-production before the production cut-over.
- MSK and Security Upgrade Blockage Removed: NiFi 2.6.0 with Java 21 is now fully operational across all clusters. MSK integration is unblocked, Parameter Contexts replace all variable-based configurations, and modern NiFi security enhancements are active for the first time.
- Load Balancing Fixed and Cluster Performance Improved: The primary node is no longer carrying a disproportionate load. Flow distribution is now even across all nodes, improving throughput efficiency and cluster resilience beyond the baseline state before the migration.
- 100% of Breaking Changes Documented: The Gap Analysis Report covers every deprecated processor, variable migration, custom NAR issue, and back-pressure risk across all four clusters. For the first time, the airline's team has a complete, documented picture of what changed and why.
“Ksolves navigated a migration that we had assessed as extremely high-risk and delivered it exactly as committed. Four clusters upgraded, twenty thousand processors validated, and not a single byte of data lost. The thoroughness of the gap analysis gave us confidence we had never had in a vendor before.”
– Head of Data Engineering, Major European Airline
What started as one of the riskiest data platform migrations the airline had ever attempted ended with every cluster upgraded, every processor validated, and every data flow running cleanly on NiFi 2.6.0.
The work Ksolves did goes beyond the migration itself. The Gap Analysis Report, the version-controlled flows in NiFi Registry, and the resolved load balancing and back-pressure issues leave the airline’s data engineering team with a platform that is better documented, better performing, and better positioned for the future than it was before the project began.
With NiFi 2.6.0 and Java 21 now in place, the team can adopt enhanced flow analytics, improved provenance management, and cloud-native deployment patterns as the next phase of the airline’s data platform evolution.
For organisations running Apache NiFi 1.x across multiple production clusters with no validated upgrade path, explore Ksolves’ Big Data Engineering Services and find out how a structured, zero-data-loss migration can be delivered for your environment.
Running Apache NiFi 1.x Across Multiple Production Clusters With No Upgrade Path?