Company logo hidden

AI Video Research Engineer Intern

Unlock employer Dubai, United Arab Emirates Posted: 17 Feb 2026

Financial

  • Estimate: $12k - $24k*
  • Zero income tax location

Accessibility

  • Fully Remote
  • Apply from abroad
  • Visa Provided

Requirements

  • Experience: Entry Level
  • English: Professional

Position

Join the company and be part of shaping the future of digital finance. We are seeking highly motivated MSc or PhD interns to work on video generation and multimodal video foundation models. Interns will engage in various components of the foundation model lifecycle and are encouraged to propose creative, research-driven ideas that advance the state of the art. In this role, you will contribute to the development and improvement of open-source video foundation models, analyze their limitations, and design scalable solutions. This is a research-focused internship with opportunities to publish at top-tier computer vision and machine learning conferences, and to work with petabyte-scale video datasets and large distributed GPU clusters containing thousands of GPUs.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

Responsibilities:

  • Research and improve open-source video and multimodal video generation foundation models.
  • Focus on areas such as pre-training, supervised fine-tuning, post-training, inference, architecture design, or evaluation.
  • Benchmark models against the current state-of-the-art, identify bottlenecks, and propose novel improvements.
  • Work with large-scale video datasets and distributed training systems.
  • Collaborate with researchers and engineers on projects with clear research and publication potential.

Minimum Qualifications:

  • MSc or PhD candidate in Computer Science, Machine Learning, Computer Vision, or a related technical field.
  • Research experience in image generation, video generation, or multimodal learning.
  • Awareness of open-source video foundation models and their limitations.
  • Proficiency with PyTorch and modern deep learning workflows.
  • Strong analytical thinking, creativity, and collaboration skills.
  • Prior first-author related publications in CVPR, ICCV, ECCV, NeurIPS, or ICLR.

Preferred Qualifications:

  • Demonstrated related work, such as research codebase or benchmarks released on GitHub or similar platforms.
  • Experience with large-scale or distributed training.
  • Hands-on experience with diffusion-based, transformer-based, or hybrid video generation models.

Language Requirements:

  • Excellent English communication skills are required.

Are you ready to be part of the future?

Apply Direct

Jobs you might like   View all jobs

Ready to apply for this role?

Apply Direct