Post a Job

Site Reliability Engineer

Unlock employer Abu Dhabi, United Arab Emirates Posted: 25 Mar 2026

Apply Direct

Financial

Estimate: $80k - $120k*
Zero income tax location

Accessibility

Office Only
Apply from abroad
Visa Provided

Requirements

Experience: Senior
English: Professional

Explore similar roles:

View Site Reliability Engineer jobs in Abu Dhabi · View all Site Reliability Engineer jobs

Position

About the Job:
The company is seeking a Senior Site Reliability Engineer (SRE) / DevOps Lead to design, scale, and enhance our cloud infrastructure and observability ecosystem. This role is ideal for those passionate about automation, resilience, and reliability.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

Work Conditions: On-site, Full-time
Location: Abu Dhabi Emirate, United Arab Emirates

Key Responsibilities:

Architect and deploy scalable, highly available cloud infrastructure for production workloads.
Lead and implement SRE best practices, ensuring system reliability, performance, and scalability.
Oversee and optimize CI/CD pipelines (e.g., Jenkins, Argo CD) for seamless deployments.
Define and monitor SLOs and SLIs to ensure service reliability and uptime.
Design and manage observability frameworks for monitoring, logging, and alerting (e.g., Elastic Stack, Prometheus, Grafana, Dynatrace, New Relic).
Manage and optimize Kubernetes clusters and Helm charts for efficient orchestration and streamlined releases.
Implement auto-healing and proactive monitoring systems to prevent outages.
Drive fault injection testing and chaos engineering (e.g., Chaos Mesh, Litmus, AWS FIS) for resilience validation.
Collaborate with engineering and product teams to embed reliability into every phase of development.
Maintain clear documentation on infrastructure, incidents, and operational processes.

Requirements:

8+ years of experience as a DevOps/SRE professional, leading enterprise SRE implementations.
Hands-on experience with AWS, GCP, or Azure (EC2, S3, RDS, Lambda, etc.).
Proficiency in Infrastructure as Code (IaC) tools (Terraform, CloudFormation, Ansible).
Proven experience in CI/CD automation, monitoring, and incident response.
Skilled in observability tools (Elastic Stack, Grafana, Prometheus, Dynatrace, New Relic).
Strong expertise in Kubernetes and Helm for large-scale deployments.
Experience with AWS managed and self-managed databases (MySQL, Cassandra, etc.).
Proficient in scripting languages (Python, Bash, or Go).
Experience designing and testing Business Continuity Planning/Disaster Recovery (BCP/DR) strategies.
Proactive in capacity planning, ensuring scalability and resilience across cloud environments.
Excellent communication, documentation, and troubleshooting skills.