Senior DevOps Engineer
Position Overview
-
As a DevOps Engineer, you will be responsible for building, automating, and maintaining scalable infrastructure and deployment pipelines across cloud and on-prem environments. The role includes managing Linux servers, deploying applications on Kubernetes, ensuring security compliance, and optimizing distributed system performance. You will configure CI/CD processes, monitor critical services, implement observability, and support developers with DevOps best practices. Responsibilities include troubleshooting infrastructure issues, improving deployment automation, overseeing certificate/security management, and ensuring system reliability and high availability.
Primary Responsibilities
-
Build, automate, and maintain scalable infrastructure and deployment pipelines across cloud and on-prem environments
-
Manage Linux servers with expertise in system administration, troubleshooting, hardening, and shell scripting
-
Deploy and manage applications on Kubernetes with stateful services, persistent volumes, Helm charts, and operators
-
Configure CI/CD processes and support developers with DevOps best practices
-
Monitor critical services and implement observability solutions
-
Troubleshoot infrastructure issues, deployment failures, networking, and distributed application problems
-
Oversee certificate management, security hardening, and system reliability
-
Ensure system reliability and high availability standards
Must-Have Skills
-
4+ years of hands-on DevOps experience
-
Strong understanding of DevOps practices, CI/CD pipelines, automation, and environment management
-
Hands-on experience installing, configuring, and optimizing distributed systems across multi-node environments
-
Strong hands-on experience with Kubernetes including deploying stateful services, persistent volumes, Helm charts, operators, and cluster administration
-
Proficiency with Infrastructure as Code (IaC) tools such as Terraform, Ansible, Helm, or similar
-
Knowledge of cloud platforms (AWS, Azure, GCP) including networking, IAM, storage, compute, security, and managed services
-
Experience with monitoring and observability tools such as Prometheus, Grafana, ELK/EFK, CloudWatch, or Datadog
-
Strong understanding of security best practices including certificate-based authentication, secrets management, encryption, TLS/SSL, RBAC, and cloud security
-
Strong troubleshooting skills for infrastructure, deployment failures, networking, and distributed application issues
-
Linux experience is mandatory including system administration, troubleshooting, hardening, and shell scripting
Good-to-Have Skills
-
DevOps experience with Big Data technologies such as NiFi, Kafka, Spark, Cassandra, Hadoop, Hive, or similar distributed data systems
-
Ability to create and maintain detailed documentation for deployments, infrastructure, troubleshooting guides, and SOPs
-
Familiarity with certificates, rotation policies, and managing PKI infrastructure
-
Working knowledge of cloud-native tools like Kustomize and Kubernetes Operators
-
Understanding of logging, tracing, alerting, and capacity planning in distributed systems
-
Familiarity with networking concepts, load balancing, and DNS
-
Experience with SRE/DevOps best practices such as reliability engineering, error budgets, SLIs/SLOs/SLAs