Post a Job

Senior Site Reliability Engineer

Unlock employer Abu Dhabi, United Arab Emirates Posted: 22 May 2026

Apply Direct

Financial

Estimate: $70k - $100k*
Zero income tax location

Accessibility

Office Only
Apply from abroad
Visa Provided

Requirements

Experience: Senior
English: Professional

Explore similar roles:

View Site Reliability Engineer jobs in Abu Dhabi · View all Site Reliability Engineer jobs

Position

About the Job: The company, a leader in AI-powered cloud and digital infrastructure, is seeking a Senior Site Reliability Engineer to drive transformative technology solutions globally. This role is crucial for designing, implementing, and operating scalable, reliable, and secure infrastructure that supports large-scale AI and HPC workloads. As a Senior Site Reliability Engineer, you will work closely with engineering, product, and operations teams to build and maintain CI/CD pipelines, manage Kubernetes-based environments, and implement observability systems that ensure high availability and performance across globally distributed platforms.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

Your Key Responsibilities:

CI/CD & Automation: Design, build, and maintain robust CI/CD pipelines using tools such as GitLab CI, Azure DevOps, and/or Jenkins for secure software delivery.
Kubernetes Operations: Optimize and manage Kubernetes clusters to ensure performance and resilience.
Infrastructure as Code: Develop and maintain infrastructure using Terraform, Helm, Ansible, or similar tools.
Observability & Monitoring: Implement monitoring solutions using Prometheus, VictoriaMetrics, Grafana, and ELK/EFK.
Incident Management: Lead root cause analysis (RCA) and implement continuous improvement for system reliability.
Reliability Engineering: Define SRE best practices, including SLAs, SLOs, and error budgets.
Logging & Alerting: Build logging, alerting, and tracing systems for proactive issue detection.
Security & Compliance: Enforce security best practices across CI/CD pipelines and runtime environments.
Collaboration: Work cross-functionally to align platform capabilities with business needs.
Mentorship: Guide junior engineers and contribute to knowledge sharing across teams.
On-call Support: Participate in on-call rotations to support critical platform services.

Required Skills/Qualifications:

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
5+ years of experience in DevOps, Site Reliability Engineering, or platform engineering roles in production environments.
Proven experience managing Kubernetes clusters (e.g., GKE, EKS, AKS, or self-managed).
Hands-on experience with CI/CD tools and automation frameworks.
Strong experience with infrastructure-as-code tools such as Terraform, Helm, or Ansible.
Proficiency in container technologies (Docker, containerd) and orchestration with Kubernetes.
Strong scripting/programming skills (e.g., Python, Bash, Go).
Experience with observability and monitoring stacks (Prometheus, Grafana, ELK/EFK).
Solid understanding of Linux systems, networking concepts, and cloud-native security best practices.

Preferred Skills/Qualifications:

Experience supporting AI/ML or HPC workloads in production environments.
Knowledge of GPU resource management, workload schedulers, and performance tuning.
Familiarity with distributed systems and large-scale infrastructure environments.
Experience with incident management frameworks and reliability engineering practices.
Strong collaboration and communication skills across cross-functional teams.

Apply Direct