Apache NiFi is the most prominent processing engine of big data that has a graphical Web UI to help non-programmers create data pipelines swiftly & without any codes. This technology frees them from those dull, text-based implementation methods. Hence, it makes NiFi an extensively used tool with a wide variety of features.
Kubernetes is an impressive open-source system that manages containerized applications across numerous hosts. Being a mature solution and fascinating for many users nowadays, it is gaining more & more popularity, also in the world of Big Data. Kubernetes provides the opportunity for faster result delivery and streamlines deployment & updates (when the CICD & Helm charts are already prepared).
Main challenges with Apache NiFi on Kubernetes
Moving some apps to Kubernetes is quite straightforward, but this isn’t the situation with Apache NiFi. It’s a StatefulSet app, and in most cases, it gets deployed as a cluster on the virtual machine or bare-metal. Had one significant thing regarding its architecture wouldn’t be there, it would have been perfect. That thing is, each NiFi node doesn’t replicate or share processing data between the NiFi Cluster Administration nodes. So, it isn’t easy to move NiFi in Kubernetes.
This somehow makes the entire process more complicated. Luckily, there is one solution to efficiently deal with NiFi’s problems regarding its working in the cluster mode that needs additionally installed NiFi Zookeeper.
The idea is to split the NiFi Data Flow pipelines into distinct NiFi instances where each instance would operate as the standalone. This is the absolutely correct way of creating the stable Apache NiFis on Kubernetes. By this method, managing their configurations becomes easy from the repository through, e.g., Helm charts having dedicated value files. If you want to get any info about this method or Helm charts, you can contact Ksolves India Ltd.
Here, we understand the first essential thing about NiFi. Apache NiFi utilizes a hefty load of reads & writes on disk, most importantly when we’ve lots of pipelines that continually read data, perform operations on them, send data, or schedule & monitor the processes as per the data flow content.
It requires performant storage in the hood. Any network storage or slow disk does not seem like the right solution here, so Ksolves recommends using object storage (basically on-premise, such as CEPH) or faster storage. For a high NiFi load, the right solution seems to use the local disk SSD, however, this will not deliver real high service availability.
The second thing here is the performance of the network between Kubernetes clusters & Kafka clusters or Hadoop clusters. As we read data coming from the source, then process it & then save it at someplace – for this, we need to have a fast, stable network connection to ensure that there would not be any bottlenecks. It is very important while saving the file, for example, saving a file to HDFS & then building the Hive table again. Here, NiFi and SFTP integration come of great use.
For the third thing, here comes the migration process. This isn’t about changing the Apache NiFis in one day; rather we should make a schedule for at least one week to shift the NiFi Data Flow pipeline, one after another, and check if the NiFi performs as expected.
The fourth challenge demands making the right CICD pipeline to meet your needs. It consists of manifold layers. The first one is to build the base Docker image by using NiFi in Docker, where we can also go for the official one.
It’s about creating specific images that have our custom NARs -using multistage Dockerfiles, fortunately, works in this kind of case and according to Ksolves, it is the best choice here. Its second part is the deployment creation to Kubernetes. Helm or Helmfile are used here as the base in many deployments; it’s easier to understand all their details.
The fifth challenge is about exposing Apache NiFi to users. In the sphere of Kubernetes, we can easily expose an app through the use of a service or ingress, which appears straightforward when talking about Apache NiFi along with HTTP. However, it becomes very complicated to use with HTTPS. In the case of Ingress, it’s required to set up the selection that is responsible for building the SSL passthrough.
A problem that occurs in these projects is, the certificate generated from Ingress is treated as the user authentication trial by NiFi. On the other side, it may work with the next scenario, where HTTPS traffic gets routed towards the web service that is responsible for the termination of SSL, and then it encrypts the HTTPS traffic to NiFi once again.
The sixth challenge focuses on NiFi Cluster Monitoring. We majorly practice Prometheus in our on-premise projects, and the Prometheus exporter’s configuration is as simplified as adding a different process in the Apache NiFi. That process will be responsible for dragging the metrics towards the PushGateway, and from there Prometheus will read it.
Here the NiFi issue is- there happen to be memory leaks in a few processors. It is vital to start running Apache NiFi in Kubernetes, examining the usage of its resources, along with verifying its management of processing data. Ksolves suggests here to set up the min. value for JVM on the same level leaving a larger difference between the RAM limit for the entire pod and the JVM memory value.
The seventh challenge is about the management of certificates as there are many ways to achieve it. The simplest solution from Ksolves is to generate one certificate each time with the NiFi Toolkit, after that run it like the sidecar for Apache NiFi.
Another solution is using the created certificate, generated truststore & keystore (store them in the form of secret files like Kubernetes secrets or by using any secret manager such as Hashicorp Vault or Google Cloud Secret Manager). After that, mount it on the StatefulSet of NiFi. If we require adding a certificate to truststore, we can easily import it through re-uploading the truststore or import it during each start dynamically.
NiFi Registry on Kubernetes – Fundamentals and Deployment
Apache NiFi Registry has been created to make it a type of Git repository and use for Apache NiFi pipelines. NiFi Registry uses the Git repository and reserves all the info on the changes made within its very own database (It is an H2 database by default, but we can set up a MySQL or PostgreSQL database instead).
It isn’t an ideal solution, there are lots of complications while using it as a fragment of the CICD pipeline. However, it still looks like the right solution to make the NiFi ecosystem store all pipelines & their history in a single place. The major challenge while managing it is storing & updating its keystore as well as truststore, just like with the Apache NiFi.
NiFi Registry is a stateful app just like Apache NiFi. It needs storage for a Git repository that is locally cloned, and for a database (if we opt for default H2). Here Ksolves recommend using MySQL or PostgreSQL, both of which are robust solutions as compared to H2. Also, we can separately manage it from that of the NiFi Registry.
Summary: Apache NiFi Arrived at a New Era
Apache NiFi comes among the most widespread Big Data applications and is in use for a very long time now. There have been several releases and it is a main component of the solution, which has become pretty mature in the current time. We can easily find many services such as NiFi Tools or NiFi Registry within its ecosystem. The use of Kubernetes as a platform to run NiFi streamlines the deployment, upgrading, management, and migration operations that are very complex with the previously used setups.
Surely, Kubernetes isn’t the remedy for every issue with NiFi, however, it can be a useful step in making the NiFi platform better. We hope you got some good recommendations from Ksolves in the blog to combat the NiFi challenges.
Still, Confused? Take the Help of Ksolves Experts!
Talking about the Apache NiFi, NiFi Registry & Kubernetes services, Ksolves is a hands-down No. 1 solution provider company. We, at Ksolves, are here 24/7 to assist you the whole time you take our services, and even after, till you get comfortable with these technologies. In case of NiFi failure and Recovery, or for any other issue, you can contact us instantly. So, feel free to call us and reach out to us at your convenience.
Call : +91 8130704295
Integrating Apache NiFi and Apache Kafka