Company logo hidden

Site Reliability Engineer

Unlock employer Abu Dhabi, United Arab Emirates Posted: 25 Mar 2026

Financial

  • Estimate: $80k - $120k*
  • Zero income tax location

Accessibility

  • Office Only
  • Apply from abroad
  • Visa Provided

Requirements

  • Experience: Senior
  • English: Professional

Position

About the Job:
The company is seeking a Senior Site Reliability Engineer (SRE) / DevOps Lead to design, scale, and enhance our cloud infrastructure and observability ecosystem. This role is ideal for those passionate about automation, resilience, and reliability.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

Work Conditions: On-site, Full-time
Location: Abu Dhabi Emirate, United Arab Emirates

Key Responsibilities:

  • Architect and deploy scalable, highly available cloud infrastructure for production workloads.
  • Lead and implement SRE best practices, ensuring system reliability, performance, and scalability.
  • Oversee and optimize CI/CD pipelines (e.g., Jenkins, Argo CD) for seamless deployments.
  • Define and monitor SLOs and SLIs to ensure service reliability and uptime.
  • Design and manage observability frameworks for monitoring, logging, and alerting (e.g., Elastic Stack, Prometheus, Grafana, Dynatrace, New Relic).
  • Manage and optimize Kubernetes clusters and Helm charts for efficient orchestration and streamlined releases.
  • Implement auto-healing and proactive monitoring systems to prevent outages.
  • Drive fault injection testing and chaos engineering (e.g., Chaos Mesh, Litmus, AWS FIS) for resilience validation.
  • Collaborate with engineering and product teams to embed reliability into every phase of development.
  • Maintain clear documentation on infrastructure, incidents, and operational processes.

Requirements:

  • 8+ years of experience as a DevOps/SRE professional, leading enterprise SRE implementations.
  • Hands-on experience with AWS, GCP, or Azure (EC2, S3, RDS, Lambda, etc.).
  • Proficiency in Infrastructure as Code (IaC) tools (Terraform, CloudFormation, Ansible).
  • Proven experience in CI/CD automation, monitoring, and incident response.
  • Skilled in observability tools (Elastic Stack, Grafana, Prometheus, Dynatrace, New Relic).
  • Strong expertise in Kubernetes and Helm for large-scale deployments.
  • Experience with AWS managed and self-managed databases (MySQL, Cassandra, etc.).
  • Proficient in scripting languages (Python, Bash, or Go).
  • Experience designing and testing Business Continuity Planning/Disaster Recovery (BCP/DR) strategies.
  • Proactive in capacity planning, ensuring scalability and resilience across cloud environments.
  • Excellent communication, documentation, and troubleshooting skills.

Information Security Responsibilities:

  • Comply with the company's Information Security & Service Management policies.
  • Maintain the confidentiality and integrity of all information assets.
  • Attend mandatory information security trainings.
  • Report any security incidents through official channels.
Apply Direct

Jobs you might like   View all jobs

About IT System Custom Software Development Company

Company details are hidden. Subscribe to view full company profile.

Ready to apply for this role?

Apply Direct