G42 Circular Logo

Senior Engineer - Site Reliability

G42 Abu Dhabi, United Arab Emirates Posted: 05 Sep 2024

Financial

  • Estimate: $120k - $180k*
  • Zero income tax location

Accessibility

  • Hybrid
  • Visa Provided

Requirements

  • Experience: Senior
  • English: Professional

Position

About the Job
With operations in 27 countries across 4 continents, M42 is at the epicenter of AI innovation in healthcare, driving the Emirati Genome Program, the world's largest population genome program. As part of a cross-functional and internationally trained team, you'll be working on cutting-edge AI projects that have a significant impact on global healthcare.

As a Senior Site Reliability Engineer (SRE) in our team, you will be responsible for collaborating with various business units to automate and streamline operations and processes by building and maintaining tools for deployment, monitoring, and operations.

Responsibilities

  • Design, implement, and maintain observability solutions on the Azure platform using tools such as Azure Monitor, Application Insights, Azure Log Analytics, and open-source tools.
  • Develop custom metrics, logs, and traces to monitor the health, performance, and availability of Azure resources, applications, and services.
  • Configure and manage alerts, notifications, and dashboards to enable real-time monitoring and incident response.
  • Monitor trends in system behavior over time and proactively address issues before they become critical.
  • Set appropriate thresholds for metrics to define normal and abnormal behavior, understanding prioritization of alerts and alert escalation.
  • Collaborate with software development teams to instrument code for logging, tracing, and telemetry collection.
  • Analyze system metrics, logs, and traces to identify trends, anomalies, and performance bottlenecks.
  • Troubleshoot and debug issues related to performance, availability, and scalability in Azure environments.
  • Automate monitoring, logging, and alerting workflows using scripting languages and infrastructure-as-code tools.

You will have proven experience in driving efficiency, areas of improvement, and efforts end-to-end while working with stakeholders. Proficiency in any query language and experience in web portal building is essential. Good knowledge of CI/CD and DevOps technologies is required, along with maintaining suitable deployment-specific SOPs, templates, and processes. Strong communication skills, attention to detail, and the ability to produce accurate and consistent architectural documentation are necessary.

There might be a requirement to support on weekends occasionally, depending on project demand.

Qualifications

  • Minimum 8+ years of experience in Cloud Infrastructure Monitoring and DevOps.
  • In-depth knowledge of Unix/Linux systems, including system internals, file systems, and network protocols.
  • Experience with monitoring tools (e.g., Prometheus, Grafana) and troubleshooting issues.
  • Ability to analyze system performance and plan for future capacity requirements.
  • Proficient with alerting tools such as Prometheus, Grafana, and Nagios.
  • Working knowledge in PowerBI, SQL, scripting, Python, Azure Monitoring, and ITIL processes.
  • Strong experience in developing Continuous Integration/Continuous Delivery pipelines (CI/CD).
  • Process-oriented, following ITIL methodology.
  • Experience in defining and building performance monitoring systems and ITSM tools.
  • Strong background in Linux, Windows, Storage, and Network devices administration.
  • Added advantage to have worked with Configuration Management and Deployment tools such as Ansible, Terraform, Puppet/Chef, etc.
  • Hands-on experience in Docker, Kubernetes, and Argo CD is required.
  • Azure certification (AZ-104) is mandatory; preferably with AZ-305.
  • Certification in TOGAF is advantageous but not mandatory.

What We Look For
We seek a performance-driven, inquisitive individual with the agility to adapt to ambiguity. You should be eager to explore opportunities for meaningful collaboration with stakeholders and aspire to create unique customer-centric solutions. A bias for action and a passion for conquering new frontiers in the AI space are at the heart of the M42 community.

What Working at M42 Offers

  • Culture: An open, diverse, and inclusive environment with a global vision that encourages personal growth and focuses on groundbreaking, industry-first innovations.
  • Career: Outstanding learning, development, and growth opportunities via structured training programs and innovative, high-tech projects.
  • Work-Life: A hybrid work policy to strike the perfect balance between office and home.
  • Rewards: A competitive remuneration package with a host of perks, including healthcare, educational support, leave benefits, and more.
Apply now

Jobs you might like   View all jobs

About G42

A leading AI & Cloud Computing company based in Abu Dhabi, committed to inventing a better everyday through the power of people and technology.