Company logo hidden

AI Infrastructure Engineer

Unlock employer Sharjah, United Arab Emirates Posted: 19 May 2026

Financial

  • Estimate: $90k - $120k*
  • Zero income tax location

Accessibility

  • Office Only
  • Visa Provided

Requirements

  • Experience: Senior
  • English: Professional

Position

About the Job: The AI Infrastructure Engineer is a platform specialist responsible for architecting, building, and operating high-performance AI infrastructures to support advanced AI workloads, including LLMs, GenAI, Computer Vision, and MLOps. This role focuses on managing GPU clusters (NVIDIA A100/H100), deploying and maintaining Red Hat OpenShift AI (RHODS), and ensuring secure, scalable, and cost-efficient AI platforms across Sovereign Cloud and hybrid/multi-cloud environments. The engineer will facilitate enterprise-grade AI adoption for over 200 government entities.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

Key Responsibilities & Deliverables:

  • GPU & AI Platform Architecture: Design and implement GPU-based compute clusters, define reference architectures for LLM hosting, Vector Databases, MLOps, and high-performance storage/networking.
  • GPU Cluster Operations: Install, configure, and optimize core components including CUDA, cuDNN, NCCL, and NVIDIA Drivers. Implement GPU partitioning, scheduling, and performance tuning for high-end GPUs (e.g., A100/H100).
  • OpenShift AI (RHODS) Management: Deploy, configure, and maintain the RHODS platform for multi-tenant use, integrating the NVIDIA GPU Operator for efficient GPU scheduling and supporting Data Scientists with Notebooks, Training, and Inference Endpoints.
  • LLM & Model Serving: Build and manage infrastructure for hosting and serving open-source LLM frameworks and supporting RAG pipelines, LoRA adapters, and Vector Databases.
  • MLOps & Automation: Implement Infrastructure as Code (IaC) with Terraform and Ansible and build robust MLOps pipelines for data preparation, training, evaluation, and monitoring.

Required Qualifications & Experience:

  • Experience: 7–12 years in Cloud Infrastructure, DevOps, ML Infrastructure, or Platform Engineering.
  • Deep Hands-On Expertise: GPU Systems (NVIDIA A100/H100), Linux, Containers, Kubernetes, OpenShift AI (RHODS), LLM Hosting, and supporting Vector Databases.
  • Strong Experience In: TensorFlow, PyTorch, Hugging Face, Distributed Training, and MLOps Stacks (MLflow, Kubeflow).

Essential Skills & Competencies:

  • Technical: Deep understanding of GPU compute, HPC architectures, and ML performance profiling. Strong skills in IaC (Terraform/Ansible), CI/CD, and OpenShift/Kubernetes operators.
  • Soft Skills: Strong troubleshooting, optimization, and performance engineering mindset. Excellent cross-functional collaboration and documentation skills.

Preferred Certifications:

  • NVIDIA Deep Learning / AI Infrastructure Certification
  • Red Hat OpenShift AI specialization
  • Kubernetes CKA/CKAD
  • Azure AI or Oracle Cloud AI certifications
  • Terraform & Ansible certifications

Language Requirements: Not specified.

Apply Direct

Jobs you might like   View all jobs

About IT Services and IT Consulting Company

Company details are hidden. Subscribe to view full company profile.

Ready to apply for this role?

Apply Direct