Company logo hidden

Senior Research Engineer - Multimodal & Video Foundation Model

Unlock employer Dubai, United Arab Emirates Posted: 19 Aug 2025

Financial

  • Estimate: $90k - $120k*
  • Zero income tax location

Accessibility

  • Fully Remote
  • Apply from abroad
  • Visa Provided

Requirements

  • Experience: Senior
  • English: Professional

Position

About the Job:
Join Tether and shape the future of digital finance. At Tether, we’re pioneering a global financial revolution with cutting-edge solutions that empower businesses—from exchanges and wallets to payment processors and ATMs—to seamlessly integrate reserve-backed tokens across blockchains. By harnessing blockchain technology, Tether enables secure, instant, and affordable digital token transactions.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

As a member of the AI model team, you will drive innovation in architecture development for cutting-edge models, including small, large, and multi-modal systems. Your mission is to explore and implement novel techniques and algorithms aimed at significant advancements in areas like data curation and resolving pre-training bottlenecks to enhance model performance.

Responsibilities:

  • Pioneer multimodal and video-centric research, contributing to usable prototypes and scalable systems.
  • Design and implement novel AI architectures for multimodal language models integrating text, visual, and audio modalities.
  • Engineer scalable training and inference pipelines for large-scale multimodal datasets across distributed GPU systems.
  • Optimize systems for efficient data processing and pipeline throughput.
  • Build modular tools for preprocessing and managing multimodal data assets (images, video, text).
  • Collaborate with research and engineering teams to translate model innovations into production solutions.
  • Prototype generative AI applications showcasing new capabilities of multimodal foundation models.
  • Develop benchmarking tools to evaluate model performance across diverse multimodal tasks.

Qualifications:

  • Bachelor’s degree in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience.
  • Expertise in Python & Pytorch, with experience in the full development pipeline.
  • Experience working with large-scale text data; experience with audio, video, image, and/or text data is a bonus.
  • Direct hands-on experience in developing or benchmarking LLMs, Vision Language Models, Audio Language Models, or generative video models.

Preferred Skills:

  • PhD in relevant fields such as Computer Vision, Machine Learning, or NLP.
  • Demonstrated expertise in video generation foundation models and/or multimodal research.
  • First-author publications at leading AI conferences (e.g., CVPR, ICCV, NeurIPS).

Language Requirement:

  • Excellent English communication skills are required.
Apply Direct

Jobs you might like   View all jobs

Ready to apply for this role?

Apply Direct