The Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, performance, and scalability of the organization's infrastructure and applications. This role focuses on automating operations, managing cloud and Kubernetes environments, maintaining CI/CD pipelines, monitoring system health, and resolving production issues. Working closely with development teams, the SRE helps build resilient, secure, and efficient platforms that support continuous delivery and business growth.
Ready to apply for roles like this?
Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.
Unlock employer & apply directly
Tasks & Responsibilities:
- Run the cloud environment by monitoring availability and taking a holistic view of system health
- Build software and systems to manage platform infrastructure and applications
- Improve reliability, quality, and time-to-market of our suite of software solutions
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
- Provide primary operational support and engineering for multiple large, distributed software applications
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplifts
- Balance feature development speed and reliability with well-defined service level objectives
- Deploy updates and fixes
- Build tools to reduce occurrences of errors and improve customer experience
- Perform root cause analysis for production errors
- Investigate and resolve technical issues
- Design procedures for system troubleshooting and maintenance
Requirements:
- Bachelor’s Degree in computer science, information technology, or equivalent field of studies
- The education levels can be replaced by years of experience
- 3+ years of experience in a similar position (SRE, DevOps, or infrastructure engineer)
- Advanced knowledge of compliance and regulations
- Experience with Kubernetes administration
- Experience with infrastructure as code tools such as Terraform and Ansible
- Experience with at least one of the major cloud providers: AWS, GCP, Azure, or OCI
- Experience with architecting, developing, and troubleshooting large-scale systems
- Experience building CI/CD pipelines (preferably GitOps)
- Experience with monitoring and observability tools such as Prometheus, Loki, Jaeger, and Sentry
- Experience in managing databases including (backup and restore plans, replication, and clustering) such as PostgresSQL, and MongoDB
- Good networking knowledge (preferably experience with VPNs and Service Mesh)
Core Competencies:
- Self-Actualization & Fulfilment: Proficiency Level – ADVANCED
- Team Synergy & Development: Proficiency Level - ADVANCED
- Entrepreneurial Mindset & Drive: Proficiency Level - ADVANCED
- Business Acumen & Diligence: Proficiency Level - ADVANCED
Location
Riyadh, Riyadh, Saudi Arabia