Post a Job

Principal Site Reliability Engineer

Unlock employer Abu Dhabi, United Arab Emirates Posted: 30 Apr 2026

Apply Direct

Financial

Estimate: $95k - $120k*
Zero income tax location

Accessibility

Office Only
Visa Provided

Requirements

Experience: Senior
English: Professional

Explore similar roles:

View Site Reliability Engineer jobs in Abu Dhabi · View all Site Reliability Engineer jobs

Position

The company, a leader in AI-powered cloud and digital infrastructure, is seeking a Principal Site Reliability Engineer to architect and lead the evolution of our globally distributed infrastructure supporting AI and private cloud workloads. This high-impact technical leadership role is centered on building scalable, resilient, and self-healing platforms through advanced automation and AIOps.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

Key Responsibilities:

Platform Architecture & Strategy:
- Define and lead the long-term roadmap for infrastructure, CI/CD, and Kubernetes platforms
- Design scalable, distributed systems aligned with AI/ML and HPC workloads
- Establish standards for infrastructure-as-code and platform engineering
Automation & AIOps:
- Design and implement AI-driven automation and self-healing systems
- Develop autonomous workflows for incident remediation and capacity optimization
- Evolve observability into predictive AIOps capabilities
Kubernetes & Infrastructure Engineering:
- Architect high-performance Kubernetes environments for multi-tenancy and GPU-intensive workloads
- Optimize infrastructure for performance, scalability, and cost efficiency
- Support advanced scheduling and orchestration frameworks for AI workloads
Observability & Reliability:
- Build and enhance observability platforms integrating metrics, logs, and tracing
- Define SLOs/SLIs aligned with business outcomes
- Lead root cause analysis (RCA) and promote reliability best practices including error budgets
Leadership & Technical Excellence:
- Act as the escalation point for complex system issues
- Mentor and develop SRE and DevOps teams, driving a culture of excellence
- Lead architectural reviews and contribute to internal Centers of Excellence
Cross-Functional Collaboration:
- Partner with product and engineering teams to balance innovation with reliability
- Translate technical challenges into business impact for senior stakeholders
- Influence infrastructure and platform strategy across the organization

Required Qualifications & Experience:

10+ years of experience in Site Reliability Engineering, Platform Engineering, or Systems Architecture
Proven experience designing and operating large-scale distributed systems
Deep expertise in Kubernetes environments (EKS, GKE, or bare metal), including GPU workloads
Strong programming skills in Python, Go, or Rust
Extensive experience with Terraform, Helm, and infrastructure-as-code practices
Strong understanding of observability systems (metrics, logging, tracing)

Preferred Qualifications:

Experience with AI/ML infrastructure, including model serving and data pipelines
Familiarity with scheduling frameworks (e.g., Ray, Kueue, Volcano)
Experience building automation or AI-driven operational tools
Certifications such as CKA, AWS/Azure Solutions Architect
Experience influencing technical strategy across large organizations

What We’re Looking For:
A highly experienced and forward-thinking engineer with deep technical expertise and a passion for building resilient, scalable systems. You should be a strong problem solver, an influential leader, and a strategic thinker who can drive innovation while maintaining operational excellence.

Benefits:

Competitive Salary: Attractive salary package based on skills and experience
Yearly Bonus: Performance-based annual bonus
Exclusive Discount Cards: Access to special benefits with Esaad and Fazaa cards
Premium Family Insurance: Comprehensive health coverage for you and your family
Learning & Development: Access to top-tier learning platforms for career growth

This role promotes an inclusive, innovative, and collaborative work environment, grounded in values such as trust, accountability, and high performance.

Apply Direct