The Production System Operations Engineer ensures the organization's systems, services, and network function efficiently and securely. This role requires the candidate to collaborate with stakeholders to align technical operations with business objectives, ensuring high availability, performance, and scalability of all systems.
Key Responsibilities:
-
Infrastructure and Operations Management:
- Oversee the maintenance and optimization of technical infrastructure, including cloud environments, applications, databases, networks, and storage.
- Manage systems monitoring, incident management, and problem resolution to minimize downtime and ensure high availability.
- Rotate on-call in the team to monitor system and application alerts, notifications, and dashboards 24/7.
- Collaborate with the SOC (security operations center) team on security event/incident response.
- Plan and execute disaster recovery and business continuity strategies.
- Develop, implement, and maintain policies and procedures for technical operations, ensuring compliance with industry and government standards and regulations.
-
Strategic Planning and Execution:
- Collaborate with leadership to develop and execute the technical operational strategy.
- Identify opportunities for technology enhancements and cost optimization.
- Prepare and manage the technical operations budget, ensuring cost-effectiveness.
-
Vendor and Stakeholder Management:
- Manage relationships with vendors and service providers to ensure the delivery of high-quality services and products.
- Act as the primary point of contact for technical escalations and coordinate resolutions with internal teams and external stakeholders.
-
Security and Compliance:
- Ensure the security, integrity, and compliance of technical operations in line with organizational and regulatory requirements.
- Collaborate with cybersecurity teams to identify and mitigate risks.
Qualifications:
- Education: Bachelor’s degree in Computer Science, Information Technology, or a related field (Master’s degree preferred).
- Experience: 5+ years in cloud technical operations and infrastructure, managing large-scale systems, cloud environments, and enterprise networks.
- Skills and Competencies:
- Proficiency in cloud infrastructure technologies (e.g., Azure, Linux, MySQL).
- Experience with ITIL practices, SRE methodologies, and DevOps principles.
- Strong analytical, problem-solving, and decision-making abilities.
- Good communication skills to collaborate effectively with team members and stakeholders.
Key Performance Indicators (KPIs):
- Uptime percentage for critical systems and infrastructure.
- Mean Time to Resolution (MTTR) for incidents.
- Budget adherence and cost-saving initiatives.
Offer:
- A fantastic new office on Yas Island.
- Opportunity to work in a growing business and with like-minded professionals in a diverse environment.
- Training and learning opportunities plus company benefits that support health and well-being.