Post a Job

Lead Site Reliability Engineer

Unlock employer Abu Dhabi, United Arab Emirates Posted: 27 Apr 2026

Apply Direct

Financial

Estimate: $80k - $120k*
Zero income tax location

Accessibility

Office Only
Apply from abroad
Visa Provided

Requirements

Experience: Senior
English: Professional

Explore similar roles:

View Site Reliability Engineer jobs in Abu Dhabi · View all Site Reliability Engineer jobs

Position

We are seeking a seasoned DevOps & Site Reliability Engineering (SRE) Lead to design, scale, and elevate our cloud infrastructure and observability ecosystem. If you’re passionate about automation, system resilience, and building highly reliable platforms, this role is for you.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

Location: Abu Dhabi Emirate, United Arab Emirates
Work Conditions: On-site, Full-time

Key Responsibilities:

Architect and deploy scalable, highly available cloud infrastructure
Lead SRE best practices to ensure reliability, performance, and scalability
Optimize CI/CD pipelines (Jenkins, Argo CD or similar) for seamless deployments
Define and track SLOs & SLIs to maintain uptime and service health
Build robust observability frameworks (Elastic Stack, Prometheus, Grafana, Dynatrace, New Relic)
Manage Kubernetes clusters and Helm charts for efficient orchestration
Implement auto-healing systems and proactive monitoring
Drive chaos engineering and resilience testing (Chaos Mesh, Litmus, AWS FIS)
Collaborate with engineering and product teams to embed reliability into development
Maintain clear infrastructure and incident documentation

What We’re Looking For:

8+ years of experience in DevOps/SRE, including leadership in enterprise environments
Hands-on experience with AWS, GCP, or Azure
Strong expertise in Infrastructure as Code (Terraform, CloudFormation, Ansible)
Proven experience in CI/CD, monitoring, and incident response
Deep knowledge of observability tools and practices
Strong Kubernetes and Helm experience at scale
Experience with databases like MySQL, Cassandra, etc.
Proficiency in Python, Bash, or Go
Experience in BCP/DR planning and capacity management
Strong communication, troubleshooting, and documentation skills

Information Security: