Presight Circular Logo

Lead Site Reliability Engineer

Presight Abu Dhabi, United Arab Emirates Posted: 03 Jul 2025

Financial

  • Estimate: $80k - $120k*
  • Zero income tax location

Accessibility

  • Office Only
  • Apply from abroad
  • Visa Provided

Requirements

  • Experience: Senior
  • English: Professional

Position

Presight is seeking a meticulous and expert Lead Engineer - Site Reliability to build and support the delivery model that empowers product & technology teams to develop high-quality products, improve platform infrastructure, and strengthen the reliability of products and solutions. This role is vital in defining and establishing the delivery model used in developing cutting-edge, next-generation analytics solutions and services.

Key Responsibilities:

  • Drive reliability, performance, and scalability across our infrastructure with relevant stakeholders.
  • Own the SRE roadmap, guiding implementation through mentorship, code contributions, and hands-on infrastructure work.
  • Partner closely with Engineering, Data Science, and Product teams to embed reliability into the development lifecycle.
  • Function as the architect by leading reliability strategies across services and environments.
  • Define and enforce Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets with engineering leadership.
  • Lead incident response and root cause analysis.
  • Implement automation to reduce toil and improve system resilience.
  • Manage capacity planning, traffic forecasting, and cost optimization.
  • Mentor junior and senior Site Reliability Engineers in technical and process excellence.
  • Collaborate with MLOps, DevSecOps, and CloudOps teams to enforce best practices.
  • Champion observability, metrics-driven decisions, and platform maturity.
  • Deploy monitoring tools such as Prometheus and Grafana to track system performance.
  • Ensure that system reliability adheres to security and compliance standards, especially within regulated sectors.
  • Comply with QHSE (Quality Health Safety and Environment), Business Continuity, Information Security, Privacy, Risk, Compliance Management, and Governance policies and procedures.

Qualifications:

  • Bachelor's Degree in Computer Engineering or related field.
  • Minimum 10 years of experience in site reliability with 2 years in people management.
  • Expertise in Kubernetes, CI/CD (e.g., GitLab), and infrastructure-as-code (Terraform/Helm).
  • Strong experience in cloud services (Azure, AWS, or GCP).
  • Experience with multi-tenant systems or high-throughput data platforms.
  • Exposure to AI/ML infrastructure or MLOps pipelines.
  • Proven background in SRE principles, SLIs/SLOs, and reliability-focused engineering.
  • Programming proficiency in Python or Shell (preferred).
  • Deep understanding of distributed systems, networking, and incident management.
  • A highly detail-oriented and methodical approach to problem solving.
  • Strong analytical skills and a passion for technology, troubleshooting, and customer service.
  • Excellent verbal and written communication skills.

What We Look For:
Join Presight, where we foster a culture of innovation, provide outstanding career growth opportunities, and offer competitive rewards. If you are eager to explore new frontiers in AI and thrive in a dynamic environment, we welcome you to our community.

What Working at Presight Offers:

  • Culture: An open, diverse, and inclusive environment that encourages personal growth and focuses on groundbreaking, industry-first innovations.
  • Career: Accelerate your career through high-impact projects and access to continuous growth and learning opportunities.
  • Rewards: A competitive remuneration package with various perks, including healthcare, education support, leave benefits, and more.
Apply now

Jobs you might like   View all jobs

About Presight

Presight, an ADX-listed public company limited by shares whose majority shareholder is Abu Dhabi company G42, is the region’s leading big data analytics company powered by Artificial Intelligence (“AI”). We combine big data, analytics, and AI expertise to serve every sector, of every scale, to create business and positive societal impact. With our world-class computer vision, AI and omni-analytics platform as its engine, we excel at all-source data interpretation to support insight-driven decision making that shapes policy and creates safer, healthier, happier, and more sustainable societies.