Core42 Circular Logo

Senior Systems Engineer - HPC

Core42 Abu Dhabi, United Arab Emirates Posted: 22 Aug 2024

Financial

  • Estimate: $120k - $180k*
  • Zero income tax location

Accessibility

  • Hybrid
  • Visa Provided

Requirements

  • Experience: Senior
  • English: Professional

Position

About the Job
The Senior Systems Engineer - HPC will be an integral part of the Engineering team, responsible for designing and delivering High-Performance Computing (HPC) solutions, along with their associated hardware, platform, software, networking, and storage components. This role extends to collaboration with multiple internal stakeholders, leading complex projects, and implementing innovative solutions. Candidates should have extensive experience in building HPC systems across various industries, and delivering complex, large-scale projects.

Responsibilities

  • Oversee the design, deployment, and optimization of HPC infrastructure, including hardware, software, networking, and storage components.
  • Prepare and review high-level design (HLD) and low-level design (LLD) documents, scopes of work, RFIs, RFPs, and RFQs.
  • Maximize the efficiency and performance of HPC systems through optimal resource utilization and minimal downtime.
  • Collaborate with product and architecture teams to align technical solutions with customer computational needs and company strategic goals.
  • Develop and implement automation solutions and tools for deployment and management.
  • Establish monitoring, logging, and alerting systems.
  • Serve as level 3 support for complex technical issues, conducting root cause analysis and implementing solutions for HPC system reliability.
  • Maintain comprehensive documentation of HPC configurations, procedures, and best practices.
  • Ensure security and compliance of HPC infrastructure, implementing necessary safeguards and adhering to regulatory standards.
  • Collaborate with vendors for hardware and software procurement and support.
  • Assist with budget planning and management for HPC-related expenditures to ensure cost-effective solutions.
  • Stay updated on HPC technology trends, evaluating and recommending new technologies and practices to enhance HPC capabilities.

Qualifications

  • Bachelor’s degree in Information Technology, Computer Science, or a relevant field.
  • Minimum of 7 years of hands-on experience in HPC systems administration and infrastructure management.
  • Advanced knowledge in configuring, optimizing, and maintaining complex HPC environments.
  • Proficiency in parallel computing principles, distributed computing, and cluster management.
  • Extensive experience in system administration of Linux environments.
  • Familiarity with job schedulers and resource managers commonly used in HPC (e.g., Slurm, LSF, PBS, Kubernetes).
  • Knowledge in Data Center network design and technologies (OSI model, TCP/IP stack, routing, VLAN/VxLAN).
  • Experience with large-scale data storage solutions (e.g., Ceph, NFS, Lustre).
  • Proficiency in parallel libraries/languages (e.g., MPI, OpenMP, OneAPI, CUDA).
  • Competence with configuration management tools (e.g., Ansible, Puppet, Terraform) and integration with Git.
  • Strong scripting and automation skills (e.g., Python, Bash).
  • Excellent problem-solving skills with the ability to troubleshoot complex HPC issues.
  • Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Nagios).
  • Effective communication and collaboration skills for cross-functional teamwork.
  • Relevant certifications in cloud computing, virtualization, container technologies, and systems architecture are advantageous.

What We Look For
We seek performance-driven individuals with an inquisitive mindset and the ability to adapt to ambiguity. Ideal candidates should be eager to form meaningful collaborations and create unique customer-centric solutions. A passion for AI and a bias for action are at the core of our company culture.

What Working At Core42 Offers

  • Culture: An open, diverse, and inclusive environment focused on personal growth and groundbreaking innovations.
  • Career: Opportunities for learning, development, and growth through structured training programs and innovative projects.
  • Work-Life: A hybrid work policy that balances office and home work.
  • Rewards: A competitive remuneration package that includes healthcare, education support, leave benefits, and more.

If you meet the criteria outlined above, we encourage you to apply.

Apply now

Jobs you might like   View all jobs

About Core42

Core42 accelerates what people, enterprises, and nations can achieve with AI. As a full-spectrum AI enablement solutions provider, we empower customers to thrive in the AI-driven era. Formed from the merger of G42 Cloud, Inception, and Injazat, we are dedicated to leveraging AI for meaningful change.

Benefits at Core42

    • Join an elite pool of 1500 AI specialists.
    • Opportunities to work on groundbreaking projects.
    • Comprehensive suite of AI, cloud, and cybersecurity services.