Company logo hidden

Lead Site Reliability Engineer

Unlock employer Abu Dhabi, United Arab Emirates Posted: 26 May 2026

Financial

  • Estimate: $90k - $120k*
  • Zero income tax location

Accessibility

  • Hybrid
  • Apply from abroad
  • Visa Provided

Requirements

  • Experience: Senior
  • English: Professional

Position

The company is partnering with a next-generation digital bank built from the ground up to deliver seamless, secure, and scalable financial services. Our platform is cloud-native and API-first, focused on reliability, speed, and security. We are growing rapidly and seeking top-tier Site Reliability/Ops Engineers to join our core team and help manage and scale our infrastructure.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

As a Site Reliability Engineer, you will be responsible for maintaining and scaling our core infrastructure, ensuring our banking services remain available, secure, and performant. You will closely collaborate with development, product, and security teams to automate operations, manage cloud infrastructure, and uphold high availability standards.

Responsibilities:

  • Lead the design, operation, and continuous improvement of cloud infrastructure, Kubernetes platforms, and reliability practices across production environments.
  • Direct and develop a team of 3-5 engineers, providing mentorship along with clear delivery ownership and performance leadership.
  • Establish and drive standards for observability, deployment safety, incident management, and self-service platform capabilities.
  • Build automation across infrastructure provisioning, CI/CD workflows, and operational processes to enhance consistency, resilience, and delivery efficiency.
  • Collaborate with engineering, product, platform, and security teams to enhance reliability, scalability, and security operations.
  • Guide technical decisions across AWS, multi-cluster Kubernetes, blue-green deployments, service mesh, and distributed production systems.
  • Define and operationalize service level objectives (SLOs), service level indicators (SLIs), error budgets, monitoring, alerting, and post-incident improvement practices.

Skills:

  • 12+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or related roles.
  • 2+ years at a Staff Engineer, Lead Engineer, or equivalent senior technical level.
  • 2+ years supporting production-grade microservices environments at scale.
  • Strong expertise with AWS, Kubernetes, multi-cluster operations, Terraform, Helm, kubectl, CI/CD, and tools such as Jenkins.
  • Experience with observability and incident management tools like Prometheus, Grafana, and OpenSearch.
  • Familiarity with Zero Trust architecture, OAuth2, IAM, and access controls.
  • Experience working in regulated environments with standards such as PCI DSS, ISO 27001, and MAS TRM.
  • Strong leadership, decision-making, and technical documentation skills.

Success KPIs:

  • Production platforms meet agreed reliability, availability, and recovery targets.
  • Automation and repeatability of deployment and operational workflows improve.
  • Platform standards and self-service practices are widely adopted.
  • Reduction in recurring incidents and operational toil through better engineering design and automation.
  • Improvement in team capability, ownership, and execution quality through effective leadership.
Apply Direct

Jobs you might like   View all jobs

Ready to apply for this role?

Apply Direct