Rancher Kubernetes expert

Posted on July 28, 2025

Apply Now

Job Description

7 yrs
Mandatory skills:
Rancher, Kubrnetes (Rke2 and k3s), Terraform, Python
Strong background operating Prometheus, Grafana, and Elasticsearch/Fluentd/Kibana (ELK/EFK) stacks
Scope : We�re looking for a Rancher Kubernetes expert to lead the design, automation, and reliability of our on-prem and hybrid container platform. Sitting at the intersection of the Platform Engineering and Infrastructure Reliability teams, this role owns the lifecycle of Rancher-managed clusters�from bare-metal provisioning and performance tuning to observability, security, and automated operations.
You�ll apply SRE principles to ensure high availability, scalability, and resilience across environments supporting mission-critical workloads.
Core Responsibilities:
Platform & Infrastructure Engineering Design, deploy, and maintain Rancher-managed Kubernetes clusters (RKE2/K3s) at enterprise scale.
Architect highly available clusters integrated with on-prem infrastructure: UCS, VxLAN, storage, DNS, and load balancers.
Lead Rancher Fleet implementations for GitOps-driven cluster and workload management.
Performance Engineering & Optimization Tune clusters for high-performance workloads on bare-metal hardware, optimizing CPU, memory, and I/O paths.
Align cluster scheduling and resource profiles with physical infrastructure topologies (NUMA, NICs, etc.).
Optimize CNI, kubelet, and scheduler settings for low-latency, high-throughput applications.
Security & Compliance Implement security-first Kubernetes patterns: RBAC, Pod Security Standards, network policies, and image validation.
Drive left-shifted security using Terraform, Helm, and CI/CD pipelines; align to PCI, FIPS, and CIS benchmarks.
Lead infrastructure risk reviews and implement guardrails for regulated environments.
Automation & Tooling Build and maintain IaC stacks using Terraform, Helm, and Argo CD.
Develop platform automation and observability tooling using Python or GoEnsure declarative management of infrastructure and applications through GitOps pipelines SRE & Observability.
Apply SRE best practices for platform availability, capacity, latency, and incident response.
Operate and tune Prometheus, Grafana, and ELK/EFK stacks for complete platform observability.
Drive actionable alerting, automated recovery mechanisms, and clear operational documentation.
Lead postmortems and drive systemic improvements to reduce MTTR and prevent recurrence.
Required Skills
� 7+ years in infrastructure, platform, or SRE roles
� Deep hands-on experience with Rancher (RKE2/K3s) in production environments
� Proficient with Terraform, Helm, Argo CD, Python, and/or Go
� Demonstrated performance tuning in bare-metal Kubernetes environments (UCS, VxLAN, MetalLB)
� Expert in Linux systems (systemd, networking, kernel tuning), Kubernetes internals, and container runtimes
� Real-world application of SRE principles in high-stakes, always-on environments
� Strong background operating Prometheus, Grafana, and Elasticsearch/Fluentd/Kibana (ELK/EFK) stacks
Preferred Qualifications
� Experience integrating Kubernetes with OpenStack and Magnum
� Knowledge of Rancher add-ons: Fleet, Longhorn, CIS Scanning
� Familiarity with compliance-driven infrastructure (PCI, FedRAMP, SOC2)
� Certifications: CKA, CKS, or Rancher Kubernetes Administrator
� Strategic thinker with strong technical judgment and execution ability
� Calm and clear communicator, especially during incidents or reviews
� Mentorship-oriented; supports team learning and cross-functional collaboration
� Self-motivated, detail-oriented, and thrives in a fast-moving, ownership-driven culture

Required Skills

rancher kubrnetes (rke2 and k3s) terraform python strong background operating prometheus grafana and elasticsearch/fluentd/kibana (elk/efk) stacks

Recruiter: Thanesh Sahu

Company: The AI Matters

Chat on WhatsApp

Key Details

Job Type contract

Location Type remote

Location Remote

Experience 10+ years

Salary Range INR 120,000 - 130,000 / monthly

Application Deadline July 31, 2025