Post a Job

AI Infrastructure Engineer (Govt Client)

Unlock employer Dubai, United Arab Emirates Posted: 30 Jan 2026

Apply Direct

Financial

Estimate: $90k - $130k*
Zero income tax location

Accessibility

Office Only
Apply from abroad
No Relocation Support
Visa Provided

Requirements

Experience: Senior
English: Professional

Explore similar roles:

View MLOps Engineer jobs in Dubai · View all MLOps Engineer jobs

Position

The AI Infrastructure Engineer is a platform specialist responsible for architecting, building, and operating high-performance AI infrastructure to support advanced AI workloads, including LLMs, GenAI, Computer Vision, and MLOps. This role focuses on managing GPU clusters (NVIDIA A100/H100), deploying and maintaining Red Hat OpenShift AI (RHODS), and ensuring secure, scalable, and cost-efficient AI platforms across the company's Sovereign Cloud and hybrid/multi-cloud environments. The engineer will enable enterprise-grade AI adoption for over 200 government entities.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

Key Responsibilities & Deliverables:

GPU & AI Platform Architecture:
- Design and implement GPU-based compute clusters.
- Define reference architectures for LLM hosting, Vector Databases, MLOps, and high-performance storage/networking.
- Key Deliverables: Fully operational GPU-based AI infrastructure, GPU Cluster Uptime and Performance Utilization, and Reduction in Cost per Training/Inference Workload.
GPU Cluster Operations:
- Install, configure, and optimize core components: CUDA, cuDNN, NCCL, NVIDIA Drivers, and GPU Operators.
- Implement GPU partitioning, scheduling, and performance tuning for high-end GPUs (e.g., A100/H100).
- Key Deliverables: High-availability architecture for all AI workloads and complete documentation and runbooks.
OpenShift AI (RHODS) Management:
- Deploy, configure, and maintain the Red Hat OpenShift AI (RHODS) platform for multi-tenant use.
- Manage the integration of NVIDIA GPU Operator for efficient GPU scheduling, supporting Data Scientists with Notebooks, Training, and Inference Endpoints.
- Key Deliverables: Production-ready OpenShift AI (RHODS) platform and AI Project Onboarding Speed.
LLM & Model Serving:
- Build and manage infrastructure for hosting and serving open-source LLM frameworks (Llama, Falcon, Mistral) and supporting RAG pipelines, LoRA adapters, and Vector Databases (Milvus, pgvector).
- Key Deliverables: Multi-model LLM serving environment for entities and MLOps Pipeline Success Rate and Deployment Frequency.
MLOps & Automation:
- Implement Infrastructure as Code (IaC) using Terraform and Ansible, along with GitOps for automated lifecycle management of the AI platform.
- Build robust MLOps pipelines for data preparation, training, evaluation, and monitoring (using tools like MLflow/Kubeflow).
- Key Deliverables: Infrastructure automation via Terraform & Ansible and Automation Coverage for AI Infrastructure.

Required Qualifications & Experience:

Experience: 7–12 years in Cloud Infrastructure, DevOps, ML Infrastructure, or Platform Engineering.
Deep Hands-On Expertise: GPU Systems (NVIDIA A100/H100), Linux, Containers, and Kubernetes. OpenShift AI (RHODS) or equivalent Kubernetes GPU orchestration.
Strong Experience In: TensorFlow, PyTorch, Hugging Face, Distributed Training (DDP, Deep Speed), and ML Ops Stacks (MLflow, Kubeflow).

Essential Skills & Competencies:

Technical Skills: Deep understanding of GPU compute, HPC architectures, and ML performance profiling. Strong skills in IaC (Terraform/Ansible), CI/CD, and OpenShift/Kubernetes operators.
Soft Skills: Strong troubleshooting, optimization, and performance engineering mindset. Excellent cross-functional collaboration and documentation skills.

Preferred Certifications: