Post a Job

AI Infrastructure Engineer

Unlock employer Sharjah, United Arab Emirates Posted: 19 May 2026

Apply Direct

Financial

Estimate: $90k - $120k*
Zero income tax location

Accessibility

Office Only
Visa Provided

Requirements

Experience: Senior
English: Professional

Explore similar roles:

View MLOps Engineer jobs in Sharjah · View all MLOps Engineer jobs

Position

About the Job: The AI Infrastructure Engineer is a platform specialist responsible for architecting, building, and operating high-performance AI infrastructures to support advanced AI workloads, including LLMs, GenAI, Computer Vision, and MLOps. This role focuses on managing GPU clusters (NVIDIA A100/H100), deploying and maintaining Red Hat OpenShift AI (RHODS), and ensuring secure, scalable, and cost-efficient AI platforms across Sovereign Cloud and hybrid/multi-cloud environments. The engineer will facilitate enterprise-grade AI adoption for over 200 government entities.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

Key Responsibilities & Deliverables:

GPU & AI Platform Architecture: Design and implement GPU-based compute clusters, define reference architectures for LLM hosting, Vector Databases, MLOps, and high-performance storage/networking.
GPU Cluster Operations: Install, configure, and optimize core components including CUDA, cuDNN, NCCL, and NVIDIA Drivers. Implement GPU partitioning, scheduling, and performance tuning for high-end GPUs (e.g., A100/H100).
OpenShift AI (RHODS) Management: Deploy, configure, and maintain the RHODS platform for multi-tenant use, integrating the NVIDIA GPU Operator for efficient GPU scheduling and supporting Data Scientists with Notebooks, Training, and Inference Endpoints.
LLM & Model Serving: Build and manage infrastructure for hosting and serving open-source LLM frameworks and supporting RAG pipelines, LoRA adapters, and Vector Databases.
MLOps & Automation: Implement Infrastructure as Code (IaC) with Terraform and Ansible and build robust MLOps pipelines for data preparation, training, evaluation, and monitoring.

Required Qualifications & Experience:

Experience: 7–12 years in Cloud Infrastructure, DevOps, ML Infrastructure, or Platform Engineering.
Deep Hands-On Expertise: GPU Systems (NVIDIA A100/H100), Linux, Containers, Kubernetes, OpenShift AI (RHODS), LLM Hosting, and supporting Vector Databases.
Strong Experience In: TensorFlow, PyTorch, Hugging Face, Distributed Training, and MLOps Stacks (MLflow, Kubeflow).

Essential Skills & Competencies:

Technical: Deep understanding of GPU compute, HPC architectures, and ML performance profiling. Strong skills in IaC (Terraform/Ansible), CI/CD, and OpenShift/Kubernetes operators.
Soft Skills: Strong troubleshooting, optimization, and performance engineering mindset. Excellent cross-functional collaboration and documentation skills.

Preferred Certifications: