Project Name

How Ksolves Unified 8 Systems and 24 TB of Video Content into a Governed Cloud Data Lake in 6 Weeks for a Global University

How Ksolves Unified 8 Systems and 24 TB of Video Content into a Governed Cloud Data Lake in 6 Weeks for a Global University
Industry
EdTech, Education
Technology
Amazon S3, AWS Lambda, AWS AppFlow, Amazon EventBridge, AWS MWAA (Apache Airflow), AWS Glue Data Catalog, AWS IAM, AWS KMS, Amazon CloudWatch, AWS CloudTrail

Loading

How Ksolves Unified 8 Systems and 24 TB of Video Content into a Governed Cloud Data Lake in 6 Weeks for a Global University
Overview

A globally ranked business school operating across campuses in India and the UAE was running academic, financial, HR, and student operations across eight disconnected platforms. Blackboard’s learning records lived separately from Oracle’s financials, Salesforce’s CRM had no connection to the HRMS, and 24 terabytes of Echo360 lecture recordings had no governed cloud home. The same root problem blocked every AI initiative the institution wanted to build: there was no single place where all data lived.

 

The client offers undergraduate, postgraduate, and executive education to students from over 100 countries. Their ecosystem spans Salesforce, Blackboard Learn, Oracle Fusion Financials, Adrenaline HRMS, Echo360, Microsoft Office 365, a custom AI Tutor, and additional systems. With a growing data analytics school and a roadmap of AI initiatives, including a cash flow predictor and student performance models, the institution needed a governed data foundation before any model work could begin.

 

Ksolves designed and delivered a cloud-native Data Lake on AWS, consolidating all eight source systems into one authoritative raw data repository with automated daily ingestion, centralised monitoring, and full governance. The complete platform was delivered across Non-Production and Production environments in 6 weeks at a total cost of USD 21,000.

Key Challenges

The client had five problems holding back their AI and analytics roadmap:

  • Fragmented Data Across 8 Platforms: Academic, financial, HR, and learning data were spread across eight systems with no integration layer. Every cross-functional report or AI initiative required manual data extraction before any real work could begin.
  • 24 TB of Unmanaged Video Content: Echo360 lecture recordings totalling 24 TB, growing at 15 TB per year, had no governed storage and no lifecycle policy. This was the institution's largest data asset and was entirely inaccessible for analytics or AI workloads.
  • AI Initiatives Blocked at the Data Layer: Planned use cases, including a cash flow predictor, demand forecasting, and student performance models, were structurally impossible to develop because there was no centralised data access layer.
  • No Orchestration or Failure Visibility: All data movements were manual and undocumented, with no scheduling and no failure notifications. Data issues were discovered reactively after downstream teams had already been impacted.
  • No Metadata or Discoverability Framework: There was no catalogue of what data existed or how datasets related to each other. Analysts had no way to discover or trust available data, leading to repeated duplication of effort across teams.
Our Solution

The Ksolves Big Data consulting team designed a cloud-native Data Lake on AWS on a managed-cloud-first principle. Every component was selected from AWS's fully managed service catalogue, eliminating self-hosted infrastructure so the platform could be maintained by the institution's lean IT team without dedicated operations staff.

  • Cloud Object Storage Foundation (Amazon S3, Multi-Tier): S3 was deployed with three storage tiers: S3 Standard for active SaaS data, S3 Standard-IA for Echo360 video archives (31.5 TB), and S3 Glacier Flexible Retrieval for long-term historical archival, with lifecycle policies automating tier transitions and reducing cold-storage costs by 60-70%.
  • 8-Source Batch Ingestion Pipelines (AWS Lambda + AppFlow): Automated daily ingestion pipelines were built for all eight source systems, including Salesforce (2 orgs via Bulk API/AppFlow), Blackboard Learn, Oracle Fusion Financial, Adrenaline HRMS, Echo360 (multipart upload for large video files), Office 365, and the AI Tutor platform. Each pipeline included error handling, retry logic, and failure notification via email.
  • Orchestration and Scheduling (EventBridge + AWS MWAA): Amazon EventBridge handled time-based daily batch scheduling. AWS MWAA was implemented for complex dependency-based DAG orchestration, delivering a zero-ops scheduling layer with automated retries and alerting.
  • Security, Access Control, and Encryption (AWS IAM + KMS + VPC): AWS IAM enforced role-based access control across all S3 buckets with Dev/Prod environment segregation. AWS KMS provided encryption at rest for all video data. VPC Endpoints ensured data movement occurred over private network paths, meeting the institution's data residency and compliance requirements.
  • Monitoring and Governance (CloudWatch + Glue Data Catalog): CloudWatch delivered pipeline visibility, S3 storage growth tracking, and billing alarms across the 40 TB data estate. CloudTrail provided immutable audit logging. AWS Glue Data Catalog registered schemas, table definitions, and data locations for all ingested datasets, giving the institution its first searchable catalogue of institutional data.

Technology Stack

Component Details
Core Storage Amazon S3 (Standard, Standard-IA, Glacier Flexible Retrieval)
Ingestion Layer AWS Lambda, AWS AppFlow
Orchestration Amazon EventBridge, AWS MWAA (Apache Airflow)
Security and Access AWS IAM, AWS KMS, VPC Endpoints
Monitoring and Audit Amazon CloudWatch, AWS CloudTrail, AWS Config
Metadata and Governance AWS Glue Data Catalog
Source Systems Salesforce, Blackboard Learn, Oracle Fusion, Adrenaline HRMS, Echo360, Office 365, Custom AI Tutor
AI Tooling AI-assisted architecture planning, workload analysis, migration path modelling
Impact
  • 8 Systems Unified into One Governed Data Lake: All eight source systems are now ingested daily into a single governed Amazon S3 repository. The institution has its first centralised, trusted raw data foundation for all analytics and AI workloads.
  • 60-70% Lower Cold-Storage Costs on Video Archives: Echo360 video data (31.5 TB, growing at 15 TB per year) is stored in S3 Standard-IA with automated lifecycle transitions to S3 Glacier, reducing cold-storage costs by an estimated 60-70% and making video content accessible as a governed asset for the first time.
  • Full Platform Delivered in 6 Weeks for USD 21,000: The complete data lake covering all eight ingestion pipelines, multi-tier S3 storage, orchestration, security, monitoring, and governance was delivered across Non-Production and Production environments in 6 weeks. Ongoing AWS infrastructure costs are estimated at USD 6,840 per year.
  • Proactive Monitoring and Failure Alerting Established: CloudWatch delivers real-time visibility across all pipeline health metrics and billing. All pipeline failures trigger automatic email notifications to the operations team, replacing reactive firefighting with automated observability.
  • AI and Analytics Use Cases Fully Unblocked: Every planned AI initiative, including the cash flow predictor, student performance analytics, and demand forecasting, was previously blocked due to fragmented data. The governed data lake removes that structural blocker and gives teams a unified, API-accessible foundation to build on.
  • Zero Dedicated Operations Staff Required: The fully managed architecture running on S3, Lambda, AppFlow, EventBridge, MWAA, and CloudWatch requires no in-house infrastructure team, placing no additional burden on the institution's lean IT staff.
Data Flow Diagram
stream-dfd
Client Testimonial

“For the first time, our analytics and AI teams have a single, governed location where all institutional data lives. What used to require days of manual extraction from eight different systems now happens automatically every morning.” – Director of IT, Global Business School

Conclusion

Before this project, the university’s AI ambitions had no foundation to stand on. Eight disconnected systems forced manual data assembly for every initiative, 24 TB of lecture video sat unmanaged, and there was no monitoring, no catalogue, and no governed repository to build from.

 

Today, the institution runs a production-grade AWS cloud data lake that is fully managed and ready to scale with its growing AI roadmap. Ksolves, with its AI-first delivery approach, executed the full platform build in 6 weeks at USD 21,000, making it one of the most cost-efficient enterprise data lake deployments in the education sector. For institutions running disconnected SaaS platforms and looking to unlock their AI potential, explore our Big Data Services and find out what the right data foundation looks like for your workloads.

Is Your Institution’s AI Roadmap Blocked by Fragmented Data Across SaaS Systems?