The Senior Systems Engineer engages in the design, leads implementation, and provides Level 3 expert support for large-scale private cloud computing infrastructure, with a specific emphasis on computing technologies including hardware layer, operating system, hypervisor, and orchestration services. Core42 is the UAE’s national-scale enabler for cloud and generative AI, combining G42 Group’s expertise across multiple technology disciplines into a single platform for public sector and large enterprise transformations. Building on our capabilities as a sovereign cloud and HPC specialist, we bring generative AI, cybersecurity, professional, and managed services expertise to enable national-scale program deployments across industries.
Location
Abu Dhabi, UAE
Responsibilities
- Co-design, implement, and manage hybrid virtualization and containerized platforms based on OpenStack, VMware VCF and/or Red Hat OpenShift ensuring platform stability, performance, and compliance with industry standards and best practices.
- Collaborate with architecture and engineering teams on technology stack component evaluation and selection ensuring solutions are designed following best practices and optimized from both functional and non-functional perspectives.
- Conduct regular capacity planning exercises to anticipate and accommodate the growing demands on the virtualized environment, ensuring it meets current and future requirements.
- Develop and implement plans to enhance the reliability of the computing infrastructure, addressing potential points of failure and ensuring high availability of services.
- Explore, analyze, and implement performance optimization strategies for the cloud computing environment, ensuring optimal resource utilization and responsiveness.
- Collaborate with relevant teams to conduct regular performance assessments and implement improvements based on findings.
- Prepare and participate in complex changes to production environments supporting operational teams.
- Develop auto-test and automation solutions for cloud platform using tools like Jenkins and Selenium along with other configuration management tools such as Terraform, Ansible, Puppet, Chef, and GitLab CI/CD.
- Provide L3 expert support including on-call shifts with focus on immediate incident management and resolutions, such as outages, breaches, and system failures.
- Write and maintain relevant documentation ensuring completeness and quality.
- Prepare and provide trainings for operational teams in the related technical domains.
- Collaborate with security management teams to ensure that systems are safe and secure against cybersecurity threats.
- Work closely with process management and operational teams and contribute to process development standardizing collaboration framework and improving collaboration efficiency.
Qualifications
- Bachelor’s or Master’s degree in Computer Science, Engineering, Software Engineering, or other relevant technology field.
- 7+ years of experience and deep expertise in designing, implementing, and managing private cloud stacks with a focus on compute and virtualization technologies.
- Extensive hands-on experience with at least one of the following platforms/stacks: OpenStack, Apache CloudStack, VMware VCF and Red Hat OpenShift and related computing technologies such as x86 hardware, OS, KVM/ESXi and orchestration services.
- Expert level in Linux and 5+ years of hands-on experience in Linux-based environments.
- Profound understanding of hardware architecture and components [x86 and ARM, NUMA, types of memory and channels, types of NICs, etc.].
- Good understanding of network and storage types and architecture.
- Good understanding of Cloud Native concepts and technologies.
- Experience in managing large-scale public or private cloud environments and/or work in a cloud service provider environment is highly desirable.
- Advanced level in programming and scripting using Python and/or Golang, bash.
- Good knowledge in Data center network designs and related technologies [OSI model, TCP/IP stack, routing, VLAN/VxLAN, etc.].
- Understanding of storage types, architecture, and protocols such as object/block/file storages, NFS/SMB, iSCSI, FC, etc.
- Knowledge of monitoring and observability tools like Zabbix or Nagios, Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana).
- Understanding of CI/CD principles, Infrastructure as Code (IaaC) approach, and software defined infrastructure solutions.
- Experience with database management and optimization for both SQL and NoSQL databases such as MySQL, PostgreSQL, MongoDB, or Cassandra is highly desirable.
- Experience with ITSM tools such as Jira, Redmine, ServiceNow, etc.
- Relevant certifications in Linux, virtualization, and cloud computing are a plus.
- Knowledge and experience working with GPU-hardware and AI hardware accelerators is a plus.
- Strong organizational skills with the ability to multitask and prioritize.
- A proactive approach to problem-solving and decision-making.