Site Reliability Engineer
Posted on August 19, 2025
Job Description
- Job Title: Site Reliability Engineer
- Experience Required: 10 Years
- Budget: 90k
- Duration: 6 Months
- Remote
- Job Summary:
- We are seeking an experienced Site Reliability Engineer (SRE) with a strong background in infrastructure, cloud technologies, and automation to support and enhance the reliability, scalability, and performance of our critical systems. This role requires hands-on expertise in Azure cloud, DevOps practices, container orchestration, and monitoring tools, with a focus on delivering resilient and highly available systems in a dynamic, fast-paced environment.
- Key Responsibilities:
- * Ensure system reliability, availability, and performance across production environments.
- * Collaborate with engineering and operations teams to design scalable and resilient infrastructure.
- * Manage and support Azure cloud services, with a focus on best practices and cost optimization.
- * Build and maintain CI/CD pipelines using tools such as GitHub and GitHub Actions.
- * Design and implement Infrastructure as Code (IaC) using Terraform, Azure CLI, or CloudFormation.
- * Monitor system health using tools like Splunk, New Relic, and Azure Monitor.
- * Support containerization and orchestration environments using Docker and Kubernetes.
- * Automate repetitive tasks and incident remediation processes.
- * Participate in incident management, including troubleshooting, root cause analysis, and post-mortem documentation.
- * Collaborate on architecture design and implementation for AI/ML workloads, including support for Azure ML, Databricks, and other SaaS-based data tools.
- Create clear technical documentation, including runbooks, SOPs, and operational guides.
- Required Technical Skills:
- * Programming: Strong scripting and automation skills, particularly with Python.
- * Operating Systems: Deep expertise in Linux and/or Windows administration and networking concepts.
- * Cloud Platforms: Proven experience with Microsoft Azure, its architecture, services, and deployment best practices.
- * Containers & Orchestration: Solid hands-on experience with Docker, Kubernetes, and ecosystem tools.
- * Infrastructure as Code (IaC): Practical experience with Terraform, CloudFormation, or Azure CLI.
- * Monitoring & Observability: Strong understanding of observability practices using Splunk, New Relic, or Azure Monitoring.
- * CI/CD Pipelines: Experience designing and maintaining robust pipelines using GitHub, GitHub Actions, or similar tools.
Required Skills
python.
microsoft azure
new relic