Core42 Circular Logo

Senior Systems Architect - HPC

Core42 Abu Dhabi, United Arab Emirates Posted: 29 Jul 2024

Financial

  • Estimate: $150k - $200k*
  • Zero income tax location

Accessibility

  • Hybrid
  • Visa Provided

Requirements

  • Experience: Senior
  • English: Professional

Position

Overview
The Senior Systems Architect specializes in High Performance Computing (HPC) infrastructure, leveraging proven experience in planning and executing complex projects. This pivotal role focuses on developing secure, cost-efficient, and scalable platform architecture blueprints applicable for both public and private cloud environments. Key capabilities required for this role include a deep understanding of architectural principles, as well as expertise in the features and integration capabilities of HPC platforms and solutions.

Core42 is the UAE’s national-scale enabler for cloud and generative AI, combining expertise across multiple technology disciplines into a single platform for public sector and large enterprise transformations. Building on our capabilities as a sovereign cloud and HPC specialist, we bring generative AI, cybersecurity, professional, and managed services expertise to enable national-scale program deployments across industries.

Responsibilities

  • Collaborate with stakeholders to understand business requirements and translate them into technical solutions.
  • Communicate architectural decisions and strategies to both technical and non-technical audiences.
  • Prepare, review, and maintain high-level and low-level design documents, scope of work, RFIs, RFPs, and RFQs.
  • Ensure alignment of solutions with organizational goals and industry best practices.
  • Create architectural blueprints and technical documentation for proposed solutions.
  • Provide requirements for equipment specifications, estimating project labor efforts, and liaising with vendors on technical issues.
  • Lead the deployment and configuration of HPC clusters, ensuring scalability, reliability, and performance according to project documentation and design specifications.
  • Oversee the integration of HPC with existing systems and infrastructure.
  • Ensure the solutions and environments adhere to security best practices and organizational policies.
  • Stay updated with the latest trends and advancements in HPC technologies.
  • Identify opportunities for process improvements and implement enhancements to the architecture.
  • Evaluate and recommend new tools and technologies to enhance the HPC ecosystem.
  • Engage in pilot testing and commissioning activities, designing and conducting various types of tests: functional, load, and others.
  • Maintain comprehensive documentation for the new and live HPC environments.
  • Develop and deliver training sessions to engineering teams on HPC best practices and usage.

Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Engineering, Software Engineering, or related technology discipline.
  • 7+ years of experience and deep expertise in designing, implementing, and managing private cloud stacks, with 5+ years of experience in designing large-scale HPC environments.
  • Proven track record of successfully completing large-scale infrastructure projects with a focus on HPC.
  • Advanced knowledge and expertise in configuring, optimizing, and maintaining complex HPC environments, including hardware, software, and storage systems.
  • Proficiency in parallel computing principles, distributed computing, and cluster management.
  • Comprehensive knowledge and hands-on experience in Linux environments.
  • Experience with job schedulers, resource managers, and workflow orchestration tools commonly used in HPC environments (e.g., Slurm, LSF, PBS, K8S).
  • Advanced knowledge of Data Center network design and related technologies (OSI model, TCP/IP stack, routing, VLAN/VxLAN, etc.).
  • Competence in network design and configuration of switches/routers, including InfiniBand and RoCE.
  • Experience with large-scale data storage solutions, particularly Ceph, NFS, and Lustre.
  • Proficiency in one or more parallel libraries/languages such as MPI, OpenMP, OneAPI, and CUDA.
  • Competence in configuration management tools such as Ansible, Puppet, Terraform, and integration with Git.
  • Excellent problem-solving skills and the ability to troubleshoot complex HPC issues effectively.
  • In-depth knowledge of performance tuning and optimization techniques for HPC systems.
  • Solid understanding of cloud computing principles (IaaS, PaaS, SaaS).
  • Experience with Kubernetes and OpenShift, including designing, deploying, and managing Kubernetes and OpenShift clusters.
  • Knowledge of AI/ML platforms (e.g., OpenShift AI, Kubeflow, MLFlow) is highly desirable.
  • Familiarity with Agile methodologies (Scrum or Kanban) and an understanding of DevOps principles.
  • Strong attention to detail and excellent problem-solving and troubleshooting skills.

What We Look For
If you are a performance-driven, inquisitive mind with the agility to adapt to ambiguity, you will fit right in. You should be eager to explore opportunities to build meaningful collaborations with stakeholders and aspire to create unique customer-centric solutions. A bias for action and a passion to conquer new frontiers in the AI space is at the heart of the Core42 community.

What Working At Core42 Offers

  • Culture: An open, diverse, and inclusive environment that encourages personal growth and focuses on ground-breaking, industry-first innovations.
  • Career: Outstanding learning, development & growth opportunities via structured training programs and innovative, high-tech projects.
  • Work-Life: A hybrid work policy to strike the perfect balance between office and home.
  • Rewards: A competitive remuneration package with a host of perks including healthcare, education support, leave benefits, and more.

If you can confidently demonstrate that you meet the criteria above, please contact us as soon as possible.

Apply now

Jobs you might like   View all jobs

About Core42

Core42 accelerates what people, enterprises, and nations can achieve with AI. As a full-spectrum AI enablement solutions provider, we empower customers to thrive in the AI-driven era. Formed from the merger of G42 Cloud, Inception, and Injazat, we are dedicated to leveraging AI for meaningful change.

Benefits at Core42

    • Join an elite pool of 1500 AI specialists.
    • Opportunities to work on groundbreaking projects.
    • Comprehensive suite of AI, cloud, and cybersecurity services.