Company logo hidden

AI Research Lead - Multimodal & Video

Unlock employer Dubai, United Arab Emirates Posted: 04 Nov 2025

Financial

  • Estimate: $150k - $200k*
  • Zero income tax location

Accessibility

  • Fully Remote
  • Apply from abroad
  • Visa Provided

Requirements

  • Experience: Senior
  • English: Professional

Position

Join the company and shape the future of digital finance. We are seeking a Multimodal & Video Lead with a strong technical background in Image/Video/3D generation and Multimodal Foundation Models. In this role, you will drive technical directions and build multimodal foundation models for image, video, and 3D generation, editing, animation, and more.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

You will have the opportunity to lead high-impact projects, drive the technical roadmap for multimodal AI initiatives, and collaborate across the company with world-class engineers and researchers. This position emphasizes innovation in applied research and offers the chance to contribute to the global AI community.

Responsibilities:

  • Lead the research, design, and development of advanced image, video, and 3D generation models.
  • Drive innovative projects focused on text, images, audio, and video applications.
  • Define the technical roadmap for multimodal AI initiatives, aligning with business objectives.
  • Provide leadership and mentorship to AI researchers and engineers.
  • Manage the entire lifecycle of multimodal model development, from dataset curation to deployment.
  • Oversee multi-node GPU model training ensuring scalability and efficiency.
  • Integrate AI solutions into production systems in collaboration with cross-functional teams.
  • Contribute to the AI research community through publications and open-source contributions.
  • Establish best practices for coding, model evaluation, and experimentation.
  • Communicate technical insights effectively to executive leadership and stakeholders.

Minimum Qualifications:

  • PhD, MS, or equivalent experience.
  • Hands-on experience in building Image/Video/3D generation and multimodal foundation models from scratch.
  • 5+ years of experience managing or leading research and engineering teams.
  • Excellent English communication and interpersonal skills.
  • Proficiency in modern deep learning and diffusion frameworks and libraries.

Preferred Qualifications:

  • Demonstrated expertise in computer vision and video generation foundation models.
  • Strong history of innovation in multimodal and video technologies.
  • Experience with VP-level presentations and reporting.
  • Publications in leading AI conferences (CVPR, ICCV, ECCV, ICML, ICLR, NeurIPS, etc.).

If you have a passion for advancing AI technologies and wish to work with a talented global team, the company is the place for you.

Apply Direct

Jobs you might like   View all jobs

Ready to apply for this role?

Apply Direct