Designed an AI-Ready Open Data Lakehouse on Red Hat OpenShift for a Major Middle East Retailer

Industry

Retail

Technology

Apache NiFi, DFM, Apache Kafka, Apache Flink, Apache Iceberg, MinIO, Trino, Red Hat OpenShift

Overview

One of the Middle East’s largest retailers faced a decision no enterprise wants to make. Migrating to SAP S/4HANA was commercially prohibitive. Cloud-native platforms could not be deployed across all Gulf operating regions.

And the business was generating 80 terabytes of real-time retail data every day, with a sub-second processing requirement for dynamic pricing and stock decisions.

Ksolves, an AI-first company, was engaged to design a fully on-premises, open-source data lakehouse built on Apache Iceberg, Trino, NiFi, Kafka, and Flink, running entirely on the retailer’s existing Red Hat OpenShift cluster with no new hardware, no cloud dependency, and zero Power BI report changes.

Key Challenges

The client came to Ksolves with six problems that were blocking any path forward on data infrastructure modernization:

SAP S/4HANA Migration Was Too Expensive: SAP BW handled pricing, stock, billing, and governance reporting. The proposed migration to S/4HANA was not commercially justified. A different architecture was needed that matched SAP BW's capabilities without the proprietary licensing cost.
Cloud Platforms Were Not Available Across All Gulf Regions: Cloud-native platforms are not available across all Gulf Cooperation Council regions. The retailer could not build on a platform that would leave parts of their operation unserved.
80 TB of Real-Time Data Per Day with Sub-Second SLA: The platform needed to process 80 terabytes of retail data every 24 hours. It also needed to make dynamic pricing decisions and approve stock level changes in under one second. Any batch-oriented or high-latency architecture was ruled out from the start.
Existing Infrastructure Was Not Being Used for Analytics: The retailer had already invested in Red Hat OpenShift on on-premises servers and distributed Ceph-based storage. But none of it was being used as a data platform. There was no open table format deployed over the storage layer, which meant analytical queries had to go through SAP.
Every Open-Source Component Needed Enterprise Vendor Support: Procurement policy required 24/7 managed support and defined SLAs for every component. Each tool needed a named vendor before it could be approved for production.
Power BI Reports Could Not Be Touched: The analytics team had a full suite of Power BI dashboards on SAP BW. Reconnecting or rewriting them would have taken months. The new architecture had to work as a drop-in replacement with zero report changes.

Our Solution

Ksolves designed a clean, layered pipeline on the retailer's existing infrastructure. Apache NiFi ingests from source systems, Kafka buffers the event stream, Flink processes in real time, Iceberg on MinIO stores the hot data tier, and Trino serves SQL queries to Power BI with zero report changes.

Apache NiFi and DFM for Source Ingestion: NiFi ingests data from SAP, POS systems, order management APIs, and external integrations. Ksolves rebuilt DFM for on-premises OpenShift, providing centralized NiFi management, scheduled flow deployment, and 24/7 managed support.
Apache Kafka for Real-Time Event Buffering: Kafka sits between Apache NiFi and Apache Flink as a durable event queue. It ensures no data is lost if processing fails and allows full event replay for recovery. Topics are partitioned by retail domain, including pricing, stock, orders, and checkout.
Apache Flink for Sub-Second Stream Processing: Flink processes the Kafka event stream using Flink SQL, running real-time joins, computing dynamic price approvals within the sub-second SLA, and aggregating stock movements for live inventory management.
Apache Iceberg and MinIO for Open Lakehouse Storage: Apache Iceberg was deployed over MinIO on the existing Ceph storage cluster. It holds a 80 TB hot-data tier with a three-day rolling window, ACID transactions, schema evolution, and time-travel query capability.
Trino as the SAP BW Replacement: Trino serves as the distributed SQL query engine over Iceberg. Existing Power BI dashboards connect through a Presto-compatible connector with zero report changes. Trino ran in parallel with SAP BW for three to six months before fully replacing it in production. Ksolves deployed the full NiFi to Iceberg to Trino query chain on on-premises OpenShift.
Three-Partner Vendor Support Model: Ksolves covers NiFi, DFM, Iceberg, MinIO, and Trino. Wika covers Kafka and Flink. GBM covers Red Hat OpenShift and Ceph storage. Together, they provide 24x7 support for every component in the stack.

Technology Stack

Category	Technology	Role
Ingestion	Apache NiFi, DFM	Ingests data from SAP, POS, and external systems into Kafka topics
Messaging	Apache Kafka	Durable event queue between NiFi and Flink with replay capability
Stream Processing	Apache Flink	Real-time stream processing for dynamic pricing and stock decisions
Table Format	Apache Iceberg	Open table format with ACID transactions and time-travel on MinIO storage
Object Storage	MinIO on OCP, Ceph	S3-compatible storage for Iceberg data files on existing infrastructure
Query Engine	Trino	Distributed SQL over Iceberg serving Power BI with zero report changes
Infrastructure	Red Hat OpenShift	Containerizes all components on existing on-premises servers

Impact

Production deployment delivered the following outcomes across infrastructure, performance, compliance, and vendor support:

SAP BW Fully Replaced: A full NiFi-to-Kafka-to-Flink-to-Iceberg-to-Trino stack was successfully deployed and is running in production on the client's existing infrastructure, replacing SAP BW entirely.
Zero New Hardware Required: MinIO on the existing Ceph cluster provides the full S3-compatible storage layer for Iceberg. The entire platform runs on infrastructure the retailer already owns.
Sub-Second Processing Delivered: Flink SQL over Kafka delivers real-time computation within the sub-second SLA for dynamic pricing, stock management, and approval cycles.
Full Vendor Support Coverage Established: The three-partner support model across Ksolves, Wika, and GBM provides 24x7 coverage for every stack component. This satisfied the enterprise procurement requirement that had previously delayed open-source deployment.

Data Flow Diagram

Client Testimonial

“The architecture gives us everything SAP S/4HANA would have: real-time analytics, sub-second pricing, and stock visibility, on the infrastructure we already own, with open-source components that have vendor support. That is the proposal that changes the conversation.”

– Senior Data Architecture Manager, Major Middle East Retailer

Conclusion

Before this engagement, one of the Middle East’s largest retailers was stuck between two options that did not work. SAP S/4HANA was too expensive. Cloud-native platforms could not operate across all Gulf regions. And 16 TB of daily real-time data needed sub-second processing with no batch workarounds.

Today, Ksolves, with its AI-first delivery approach, has designed and validated an open-source lakehouse on the retailer’s existing Red Hat OpenShift infrastructure with no new hardware, no cloud dependency, no Power BI changes, and full three-partner vendor support. The platform is built to absorb the 120 TB per day projected by the end of 2026 and support future AI and ML workloads without architectural rework.

For enterprises blocked from cloud-native platforms by regional restrictions or on-premises requirements, explore our Big Data Services or contact our experts at sales@ksolves.com.

Have A Project Idea?

Name*

Email*

Phone Number*

Message*

What is 2 + 10 ? *