Project Name

Designed an AI-Ready Open Data Lakehouse on Red Hat OpenShift for a Major Middle East Retailer

Designed an AI-Ready Open Data Lakehouse on Red Hat OpenShift for a Major Middle East Retailer
Industry
Retail
Technology
Apache NiFi, DFM, Apache Kafka, Apache Flink, Apache Iceberg, MinIO, Trino, Red Hat OpenShift

Loading

Designed an AI-Ready Open Data Lakehouse on Red Hat OpenShift for a Major Middle East Retailer
Overview

One of the Middle East’s largest retailers faced a decision no enterprise wants to make. Migrating to SAP S/4HANA was commercially prohibitive. Cloud-native platforms could not be deployed across all Gulf operating regions.

And the business was generating 16 terabytes of real-time retail data every day, with a sub-second processing requirement for dynamic pricing and stock decisions.

Ksolves, an AI-first company, was engaged to design a fully on-premises, open-source data lakehouse built on Apache Iceberg, Trino, NiFi, Kafka, and Flink, running entirely on the retailer’s existing Red Hat OpenShift cluster with no new hardware, no cloud dependency, and zero Power BI report changes.

Key Challenges

The client came to Ksolves with six problems that were blocking any path forward on data infrastructure modernization:

  • SAP S/4HANA Migration Was Too Expensive: SAP BW handled pricing, stock, billing, and governance reporting. The proposed migration to S/4HANA was not commercially justified. A different architecture was needed that matched SAP BW's capabilities without the proprietary licensing cost.
  • Cloud Platforms Were Not Available Across All Gulf Regions: Cloud-native platforms are not available across all Gulf Cooperation Council regions. The retailer could not build on a platform that would leave parts of their operation unserved.
  • 16 TB of Real-Time Data Per Day with Sub-Second SLA: The platform needed to process 16 terabytes of retail data every 24 hours. It also needed to make dynamic pricing decisions and approve stock level changes in under one second. Any batch-oriented or high-latency architecture was ruled out from the start.
  • Existing Infrastructure Was Not Being Used for Analytics: The retailer had already invested in Red Hat OpenShift on on-premises servers and distributed Ceph-based storage. But none of it was being used as a data platform. There was no open table format deployed over the storage layer, which meant analytical queries had to go through SAP.
  • Every Open-Source Component Needed Enterprise Vendor Support: Procurement policy required 24/7 managed support and defined SLAs for every component. Each tool needed a named vendor before it could be approved for production.
  • Power BI Reports Could Not Be Touched: The analytics team had a full suite of Power BI dashboards on SAP BW. Reconnecting or rewriting them would have taken months. The new architecture had to work as a drop-in replacement with zero report changes.
Our Solution

Ksolves designed a clean, layered pipeline on the retailer's existing infrastructure. Apache NiFi ingests from source systems, Kafka buffers the event stream, Flink processes in real time, Iceberg on MinIO stores the hot data tier, and Trino serves SQL queries to Power BI with zero report changes.

  • Apache NiFi and DFM for Source Ingestion: NiFi ingests data from SAP, POS systems, order management APIs, and external integrations. Ksolves rebuilt DFM for on-premises OpenShift, providing centralized NiFi management, scheduled flow deployment, and 24/7 managed support.
  • Apache Kafka for Real-Time Event Buffering: Kafka sits between Apache NiFi and Apache Flink as a durable event queue. It ensures no data is lost if processing fails and allows full event replay for recovery. Topics are partitioned by retail domain, including pricing, stock, orders, and checkout.
  • Apache Flink for Sub-Second Stream Processing: Flink processes the Kafka event stream using Flink SQL, running real-time joins, computing dynamic price approvals within the sub-second SLA, and aggregating stock movements for live inventory management.
  • Apache Iceberg and MinIO for Open Lakehouse Storage: Apache Iceberg was deployed over MinIO on the existing Ceph storage cluster. It holds a 16 TB hot-data tier with a three-day rolling window, ACID transactions, schema evolution, and time-travel query capability.
  • Trino as the SAP BW Replacement: Trino serves as the distributed SQL query engine over Iceberg. Existing Power BI dashboards connect through a Presto-compatible connector with zero report changes. Trino ran in parallel with SAP BW for three to six months for equivalence validation. Ksolves confirmed the full NiFi to Iceberg to Trino query chain in a PoC on on-premises OpenShift.
  • Three-Partner Vendor Support Model: Ksolves covers NiFi, DFM, Iceberg, MinIO, and Trino. Wika covers Kafka and Flink. GBM covers Red Hat OpenShift and Ceph storage. Together, they provide 24x7 support for every component in the stack.

Technology Stack

Category Technology Role
Ingestion Apache NiFi, DFM Ingests data from SAP, POS, and external systems into Kafka topics
Messaging Apache Kafka Durable event queue between NiFi and Flink with replay capability
Stream Processing Apache Flink Real-time stream processing for dynamic pricing and stock decisions
Table Format Apache Iceberg Open table format with ACID transactions and time-travel on MinIO storage
Object Storage MinIO on OCP, Ceph S3-compatible storage for Iceberg data files on existing infrastructure
Query Engine Trino Distributed SQL over Iceberg serving Power BI with zero report changes
Infrastructure Red Hat OpenShift Containerizes all components on existing on-premises servers
Impact

The PoC validated the following outcomes across infrastructure, performance, compliance, and vendor support:

  • SAP BW Replacement Validated: A full NiFi-to-Kafka-to-Flink-to-Iceberg-to-Trino stack was successfully validated on-premises through a proof of concept. This demonstrated that an open-source lakehouse could effectively replace SAP BW using the client’s existing infrastructure.
  • Zero New Hardware Required: MinIO on the existing Ceph cluster provides the full S3-compatible storage layer for Iceberg. The entire platform runs on infrastructure the retailer already owns.
  • Sub-Second Processing Validated: Flink SQL over Kafka delivers real-time computation within the sub-second SLA for dynamic pricing, stock management, and approval cycles.
  • Full Vendor Support Coverage Established: The three-partner support model across Ksolves, Wika, and GBM provides 24x7 coverage for every stack component. This satisfied the enterprise procurement requirement that had previously delayed open-source deployment.
Data Flow Diagram
stream-dfd
Client Testimonial

“The architecture gives us everything SAP S/4HANA would have: real-time analytics, sub-second pricing, and stock visibility, on the infrastructure we already own, with open-source components that have vendor support. That is the proposal that changes the conversation.”

– Senior Data Architecture Manager, Major Middle East Retailer

Conclusion

Before this engagement, one of the Middle East’s largest retailers was stuck between two options that did not work. SAP S/4HANA was too expensive. Cloud-native platforms could not operate across all Gulf regions. And 16 TB of daily real-time data needed sub-second processing with no batch workarounds.

Today, Ksolves, with its AI-first delivery approach, has designed and validated an open-source lakehouse on the retailer’s existing Red Hat OpenShift infrastructure with no new hardware, no cloud dependency, no Power BI changes, and full three-partner vendor support. The platform is built to absorb the 36 TB per day projected by the end of 2026 and support future AI and ML workloads without architectural rework.

For enterprises blocked from cloud-native platforms by regional restrictions or on-premises requirements, explore our Big Data Services or contact our experts at sales@ksolves.com

Blocked from Cloud-Native Lakehouses by On-Premises or Regional Restrictions?