Site Reliability Engineer

Posted on August 19, 2025

Apply Now

Job Description

  • Job Title: Site Reliability Engineer
  • Experience Required: 10 Years
  • Budget: 90k
  • Duration: 6 Months
  • Remote
  • Job Summary:
  • We are seeking an experienced Site Reliability Engineer (SRE) with a strong background in infrastructure, cloud technologies, and automation to support and enhance the reliability, scalability, and performance of our critical systems. This role requires hands-on expertise in Azure cloud, DevOps practices, container orchestration, and monitoring tools, with a focus on delivering resilient and highly available systems in a dynamic, fast-paced environment.
  • Key Responsibilities:
  • * Ensure system reliability, availability, and performance across production environments.
  • * Collaborate with engineering and operations teams to design scalable and resilient infrastructure.
  • * Manage and support Azure cloud services, with a focus on best practices and cost optimization.
  • * Build and maintain CI/CD pipelines using tools such as GitHub and GitHub Actions.
  • * Design and implement Infrastructure as Code (IaC) using Terraform, Azure CLI, or CloudFormation.
  • * Monitor system health using tools like Splunk, New Relic, and Azure Monitor.
  • * Support containerization and orchestration environments using Docker and Kubernetes.
  • * Automate repetitive tasks and incident remediation processes.
  • * Participate in incident management, including troubleshooting, root cause analysis, and post-mortem documentation.
  • * Collaborate on architecture design and implementation for AI/ML workloads, including support for Azure ML, Databricks, and other SaaS-based data tools.
  • Create clear technical documentation, including runbooks, SOPs, and operational guides.
  • Required Technical Skills:
  • * Programming: Strong scripting and automation skills, particularly with Python.
  • * Operating Systems: Deep expertise in Linux and/or Windows administration and networking concepts.
  • * Cloud Platforms: Proven experience with Microsoft Azure, its architecture, services, and deployment best practices.
  • * Containers & Orchestration: Solid hands-on experience with Docker, Kubernetes, and ecosystem tools.
  • * Infrastructure as Code (IaC): Practical experience with Terraform, CloudFormation, or Azure CLI.
  • * Monitoring & Observability: Strong understanding of observability practices using Splunk, New Relic, or Azure Monitoring.
  • * CI/CD Pipelines: Experience designing and maintaining robust pipelines using GitHub, GitHub Actions, or similar tools.

Required Skills

python. microsoft azure new relic