Senior Applied ML Engineer job in Dubai

Position

We are developing a highly scalable media intelligence platform that processes, analyzes, and structures large volumes of multimedia content across text, image, video, and audio. As a Senior Applied ML Engineer, you will architect and build the core backend systems that power media ingestion, processing workflows, metadata generation, AI-based analysis, semantic search, and retrieval across large media libraries. This role requires deep backend engineering expertise, strong system design capability, and practical experience integrating AI/ML systems into production workflows. You will work on complex media-processing pipelines, video/audio analysis, OCR, speech-to-text, embedding generation, vector search, multimodal model integrations, and high-throughput asynchronous workloads. You will collaborate closely with engineering leadership to define backend architecture, improve reliability and scalability, and guide other engineers in delivering secure, observable, and high-performance systems.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

Key Responsibilities

Backend Architecture & System Ownership
Architect, build, and operate scalable backend services for a media intelligence platform, focusing on clean, maintainable, and production-ready systems. Own critical backend components end to end, from system design and API contracts through implementation, deployment, monitoring, and iteration. Drive architectural decisions across APIs, processing pipelines, distributed compute, storage, search, observability, cloud infrastructure, and model-serving workflows. Design data models and storage patterns for media assets, generated metadata, embeddings, processing jobs, model outputs, search indexes, and audit trails. Design high-throughput media ingestion and processing pipelines for large volumes of video, audio, image, and text content. Build distributed, event-driven workflows for media processing using queues and pub/sub systems such as SQS, Kafka, Pub/Sub, or equivalent technologies. Implement reliable asynchronous processing patterns, including retries, idempotency, dead-letter queues, backpressure handling, and fault-tolerant job execution.
AI/ML Integration & Model Workflows
Lead the development and optimization of metadata extraction, content analysis, scene detection, transcription, embedding generation, and multimodal AI inference workflows. Integrate and optimize AI/ML services within backend workflows, including model APIs, embedding pipelines, OCR, speech-to-text, scene analysis, multimodal inference, batching, caching, and fallback strategies. Collaborate with ML engineers, data scientists, or external model providers to benchmark models, compare quality/latency trade-offs, and safely roll out model upgrades.
Model Serving & Performance Optimization
Optimize AI/ML inference workflows for latency, throughput, reliability, and cost across both real-time and batch-processing paths. Work with model-serving systems such as vLLM, Triton, TGI, SageMaker, Vertex AI, or custom inference services to improve batching, concurrency, warmup behavior, timeout handling, autoscaling, and GPU utilization. Evaluate and apply practical model optimization techniques such as quantization, model distillation, batching, caching, prompt optimization, and routing to smaller or cheaper models where appropriate. Design and maintain vector search and indexing systems using technologies such as Pinecone, Weaviate, Qdrant, Elastic Vectors, FAISS, pgvector, or similar tools. Build retrieval workflows that support semantic search, similarity matching, duplicate detection, media discovery, and structured metadata search. Monitor model and system performance in production, including API latency, queue depth, processing time, model error rates, GPU utilization, confidence distributions, drift signals, and cost per processed item.

Education & Experience

Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
5-7+ years of backend engineering experience, ideally building scalable distributed systems, media platforms, data pipelines, or high-throughput backend services.
Prior experience owning major backend modules end to end, including architecture, implementation, deployment, monitoring, and production operations.
3+ years of experience integrating AI/ML inference systems into backend workflows, including model APIs, embedding pipelines, OCR, speech-to-text, scene detection, or multimodal model outputs.
Hands-on experience creating AI-powered processing pipelines for image, video, audio, or text analysis.
Practical experience with production model optimization, especially for image, video, embedding, or multimodal models.
Prior experience with vector search, semantic search, media retrieval, or similarity-matching systems is strongly preferred.
Experience mentoring engineers, leading technical discussions, and influencing architectural decisions across backend, infrastructure, and AI/ML workflows.

Technical Skills

Strong expertise in Python and/or Node.js with a deep understanding of building scalable RESTful APIs and backend architectures.
Experience with HuggingFace transformers ecosystem and deep learning frameworks such as PyTorch and TensorFlow.
Strong experience with SQL/NoSQL databases, schema design, and data modeling.
Preferred exposure to distributed systems, microservices, asynchronous processing, and event-driven patterns with SQS, Pub/Sub, Kafka, or other queueing/pub-sub systems.
Experience deploying production systems on AWS, GCP, or similar cloud platforms.
Knowledge of infrastructure patterns (compute, storage, networking, observability).

AI/ML Integration

Experience orchestrating embedding generation, scene detection, OCR, speech-to-text, image classification, video analysis, and multimodal model integrations.
Experience optimizing inference workflows for latency, throughput, reliability, and cost.
Experience working with scalable and optimized inference settings, including tuning sampling parameters, managing output-length formats, and configuring reasoning-related behaviors.
Familiarity with practical model optimization techniques such as batching, caching, quantization, model distillation, prompt optimization, fallback routing, and use of smaller models where appropriate.
Experience working with model-serving systems such as vLLM, Triton, TGI, SageMaker, Vertex AI, or custom inference services is preferred.
Experience working with LLM and multi-modal evaluation and benchmarking frameworks and domain-specific benchmarks.

System Design & Architecture

Preferred understanding of distributed systems, scaling patterns, and performance engineering.
Ability to design modular, maintainable, and efficient architectures.
Experience with API versioning, modularization, and designing long-running workflows.
Understanding of performance bottlenecks and low-latency backend patterns.

Location
Dubai, United Arab Emirates

Jobs you might like View all jobs