About the job
G42, a leading AI and Cloud Computing company based in Abu Dhabi, is at the forefront of AI innovation in healthcare, driving the Emirati Genome Program, the world's largest population genome initiative. As part of a cross-functional and internationally trained team, you will engage in cutting-edge AI projects that significantly impact global healthcare.
As a Senior Site Reliability Engineer (SRE) within our team, you will collaborate with various business units to automate and streamline operations and processes. Your primary focus will be on building and maintaining tools for deployment, monitoring, and operations.
Responsibilities:
- Design, implement, and maintain observability solutions on the Azure platform using tools such as Azure Monitor, Application Insights, and Azure Log Analytics.
- Develop custom metrics, logs, and traces to monitor the health, performance, and availability of Azure resources, applications, and services.
- Configure and manage alerts, notifications, and dashboards for real-time monitoring and incident response.
- Monitor trends in system behavior to proactively address issues before they escalate.
- Collaborate with software development teams to instrument code for logging, tracing, and telemetry collection.
- Analyze system metrics, logs, and traces to identify trends, anomalies, and performance bottlenecks.
- Troubleshoot and debug issues related to performance, availability, and scalability in Azure environments.
- Automate monitoring, logging, and alerting workflows using scripting languages and infrastructure-as-code tools.
Qualifications:
- Minimum 8+ years of experience in Cloud Infrastructure Monitoring and DevOps.
- In-depth knowledge of Unix/Linux systems, including system internals, file systems, and network protocols.
- Familiarity with monitoring tools (e.g., Prometheus, Grafana) and experience in troubleshooting related issues.
- Proficiency with alerting tools such as Prometheus, Nagios, and others, depending on the monitoring stack in use.
- Experience in developing Continuous Integration/Continuous Delivery pipelines (CI/CD) and process-oriented methodologies following ITIL.
- Strong background in Linux, Windows, storage, and network devices administration.
- Hands-on experience with Docker, Kubernetes, and Argo CD is required.
- Azure certification (AZ-104) is mandatory, preferably with AZ-305. Certification in TOGAF is advantageous but not mandatory.
What we look for:
We seek performance-driven individuals who are curious and adaptable, eager to explore opportunities for meaningful collaboration, and aspire to create unique customer-centric solutions. A passion for conquering new frontiers in the AI space is central to our community.
What working at G42 offers:
- Culture: An open, diverse, and inclusive environment that encourages personal growth and focuses on groundbreaking, industry-first innovations.
- Career: Outstanding learning, development, and growth opportunities through structured training programs and high-tech projects.
- Work-Life: A hybrid work policy that balances office and home environments.
- Rewards: A competitive remuneration package with a host of perks, including healthcare, education support, and leave benefits.
If you believe you meet the above criteria, we encourage you to reach out and submit your application.