Post a Job

Senior MLOps Engineer

Unlock employer Abu Dhabi, United Arab Emirates Posted: 03 Nov 2025

Apply Direct

Financial

Estimate: $80k - $120k*
Zero income tax location

Accessibility

Apply from abroad
Visa Provided

Requirements

Experience: Senior
English: Professional

Explore similar roles:

View MLOps Engineer jobs in Abu Dhabi · View all MLOps Engineer jobs

Position

About the Job
As part of the engineering team at the company, you will operate at the intersection of machine learning and systems design. Your focus will be on building the cloud, orchestration, and deployment layers that power the next generation of intelligent applications at the company. Collaborating with world-class AI researchers and engineers, you will work on productionizing LLMs, voice models, and multimodal systems at scale.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

About the Company
The company is a dedicated research lab committed to building, understanding, deploying, and managing large-scale AI systems. We drive innovation in foundation models and their operationalization, empowering research, education, and industry adoption through scalable infrastructure and real-world applications.

The Role
As a Senior MLOps Engineer, you will design, build, and maintain robust ML infrastructure across training, inference, and deployment pipelines. You’ll own the model lifecycle from data ingestion to real-time serving, ensuring efficient, secure, and reproducible deployment of LLM and speech models in Kubernetes-based environments. This role demands deep hands-on experience with Kubernetes (EKS), Helm, AWS cloud infrastructure, and modern MLOps toolchains (e.g., vLLM, SGLang, OpenWebUI, Weights & Biases, MLflow). Familiarity with speech/voice AI frameworks like ElevenLabs, Whisper, and RVC is also beneficial.

Key Responsibilities

Design and manage scalable ML infrastructure on AWS using EKS, EC2, RDS, S3, and IAM-based access control.
Build and maintain Kubernetes deployments for LLM and TTS inference using Helm, ArgoCD, and Prometheus/Grafana monitoring.
Implement and optimize model serving pipelines using vLLM, SGLang, TensorRT, or similar frameworks for high-throughput inference.
Develop CI/CD and MLOps automation for data versioning, model validation, and deployment (using GitHub Actions, Jenkins, or AWS CodePipeline).
Integrate OpenWebUI, Gradio, or similar UIs for user-facing model demos and internal evaluation tools.
Collaborate with ML researchers to productize models — including TTS (e.g., ElevenLabs API), ASR (Whisper), and LLM-based chat systems.
Ensure observability, cost optimization, and reliability of cloud resources across multiple environments.
Contribute to internal tools for dataset curation, model monitoring, and retraining pipelines.
Maintain infrastructure-as-code using Terraform and Helm charts for reproducibility and governance.
Support real-time multimodal workloads (voice, text, vision) across inference clusters.

Academic Qualifications

4+ years of experience in MLOps, DevOps, or Cloud Infrastructure Engineering for ML systems.
Strong proficiency in Kubernetes, Helm, and container orchestration.
Experience deploying ML models via vLLM, SGLang, TensorRT, or Ray Serve.
Proficiency with AWS services (EKS, EC2, S3, RDS, CloudWatch, IAM).
Solid experience with Python, Docker, Git, and CI/CD pipelines.
Strong understanding of model lifecycle management, data pipelines, and observability tools (Grafana, Prometheus, Loki).
Excellent collaboration skills with ML researchers and software engineers.

Professional Experience – Preferred

Extensive experience with vLLM, K8s, Elevenlabs, Whisper, Gradio/OpenWebUI, or custom TTS/ASR model hosting.
Familiarity with multi-GPU scheduling, NCCL optimization, and HPC cluster integration.
Knowledge of security, cost management, and network policy in multi-tenant Kubernetes clusters and Cloudflare systems.
Prior work in LLM deployment, fine-tuning pipelines, or foundation model research.
Exposure to data governance and responsible AI operations in research or enterprise settings.

Apply Direct