Core42 Circular Logo

Engineer - Platform Operations

Core42 Abu Dhabi, United Arab Emirates Posted: 08 Nov 2024

Financial

  • Estimate: $80k - $100k*
  • Zero income tax location

Accessibility

  • Hybrid
  • Visa Provided

Requirements

  • Experience: Senior
  • English: Professional

Position

A Platform Engineer is responsible for designing, building, and maintaining the infrastructure that supports high-performance computing tasks and AI workloads. They ensure the scalability, reliability, and efficiency of computing platforms, integrating hardware and software to optimize performance. Additionally, they collaborate with data scientists and developers to troubleshoot and enhance platform capabilities, enabling advanced computational tasks and innovations.

Core42 is the UAE’s national-scale enabler for cloud and generative AI, bringing expertise across multiple technology disciplines into a single platform for public sector and large enterprise transformations.

Responsibilities
Objectives of this role:

  • Develop and deploy scalable and efficient computing platforms to support AI and HPC workloads, ensuring they meet performance, reliability, and security requirements.
  • Continuously optimize system performance by tuning hardware configurations, software parameters, and network settings to maximize throughput and minimize latency for AI and HPC applications.
  • Integrate various tools and technologies to streamline workflows, automate repetitive tasks, and enhance overall system efficiency and manageability.
  • Implement monitoring solutions to track system health and performance, promptly identifying and resolving issues to ensure minimal downtime and optimal functionality.
  • Work closely with data scientists, researchers, and developers to understand their needs, provide technical support, and adjust the platform to accommodate evolving requirements.

Key Responsibilities:

  • Design, deploy, and maintain the underlying hardware and software infrastructure necessary for AI and HPC applications, ensuring it is scalable and robust.
  • Monitor and optimize system performance by fine-tuning configurations, managing resources, and implementing best practices to achieve maximum efficiency.
  • Develop and implement automation scripts and tools to streamline repetitive tasks, deployment processes, and system updates.
  • Integrate various technologies, including cloud services, databases, and AI frameworks, to create cohesive and effective computing environments.
  • Diagnose and resolve technical issues related to the platform, providing support to developers and data scientists to address performance bottlenecks and system failures.
  • Ensure that the computing platform adheres to security standards and compliance requirements, implementing measures to protect data and infrastructure.
  • Maintain detailed documentation of system configurations, processes, and procedures, and generate reports on system performance and resource utilization.
  • Work closely with cross-functional teams, including data scientists, researchers, and software engineers, to understand their needs and provide solutions that support their objectives.

Qualifications
Required skills and qualifications:

  • A bachelor’s degree in Computer Science, Engineering (such as Electrical or Software Engineering), Information Technology, or any related field.
  • 5 or more years of experience in platform engineering, systems administration, or a related field, focusing on high-performance computing or large-scale infrastructure management.
  • Hands-on experience with AI or HPC environments, including managing and optimizing computational resources, often required.
  • Demonstrated experience with relevant technologies such as Linux/Unix systems, cloud platforms (e.g., AWS, Azure), scripting languages, and performance tuning tools.
  • Proven track record of working on projects involving the design, implementation, and optimization of complex computing platforms.

Preferred skills and qualifications:

  • Knowledge of security best practices and tools for protecting infrastructure and data, including experience with identity management and access controls.
  • Strong analytical and troubleshooting skills to quickly identify and resolve technical issues that impact system performance or stability.
  • Effective verbal and written communication skills for collaborating with cross-functional teams and documenting technical processes.

Work Conditions

  • Culture: An open, diverse, and inclusive environment encouraging personal growth and focusing on groundbreaking innovations.
  • Career: Outstanding learning, development & growth opportunities via structured training programs and innovative, high-tech projects.
  • Work-Life: A hybrid work policy to strike the perfect balance between office and home.
  • Rewards: A competitive remuneration package with a host of perks including healthcare, education support, leave benefits, and more.

If you can confidently demonstrate that you meet the criteria above, please contact us as soon as possible.

Apply now

Jobs you might like   View all jobs

About Core42

Core42 accelerates what people, enterprises, and nations can achieve with AI. As a full-spectrum AI enablement solutions provider, we empower customers to thrive in the AI-driven era. Formed from the merger of G42 Cloud, Inception, and Injazat, we are dedicated to leveraging AI for meaningful change.

Benefits at Core42

    • Join an elite pool of 1500 AI specialists.
    • Opportunities to work on groundbreaking projects.
    • Comprehensive suite of AI, cloud, and cybersecurity services.