Our client manages data coming from different sources that can scale up to 10,000 records per minute. Managing the high magnitude of data with each JSON file containing 30-40 data entities changing every year can be a huge task. Therefore, our client required a well-managed and adaptable solution to handle the data.
Starting with the solution there were a few challenges we needed to address and overcome. These challenges included:
- Adapt to variation with less to no code changes
- Should be able to parse nested hierarchical JSON and map with the Teradata tables in order smoothly save the incoming data
- Accumulate the flow of data with high volume year-on-year
Analyzing the requirements and the challenges we had to focus on, we came up with a path-breaking solution for a meta-data-driven system. The system included:
- Prepared a mapping file to map between JSON hierarchical key with the database table and the column name along with the column type in order to typecast the data and make it program independent
- Used Apache Spark with multi-node clusters on Kubernetes with Kubernetes operator in order to create a room for future scalable data and arranged them by date and time in order to avoid data reprocessing.
- Enabled a system with easy modification to mapping files written in CSV text files which can swiftly be altered
Ksolves presented an idea to create separate mapping files for each type to manage data coming from different sources.