Company logo hidden

AI/ML Data Engineer – Unstructured Data & LLM Integration

Unlock employer Dubai, United Arab Emirates Posted: 03 Dec 2025

Financial

  • Estimate: $80k - $120k*
  • Zero income tax location

Accessibility

  • Office Only
  • Visa Provided

Requirements

  • Experience: Senior
  • English: Professional

Position

About the Job:
The company is seeking an AI/ML Data Engineer with over 7 years of experience focused on unstructured data and LLM integration. The ideal candidate will possess deep expertise in building intelligent data pipelines for unstructured content and integration within modern machine learning ecosystems. Responsibilities include constructing scalable data processing pipelines for unstructured documents (such as PDFs and emails) using PySpark and Python, implementing document cleansing, classification, and enrichment techniques, and developing workflows for LLM-based pipelines.
The candidate will collaborate closely with AI architects and data engineers to design comprehensive AI solutions while applying Agile methodologies and CI/CD best practices to enhance AI capabilities continuously.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

Key Responsibilities:

  • Build robust data processing pipelines for unstructured documents using PySpark and Python.
  • Implement document cleansing, classification, and enrichment techniques.
  • Develop and integrate data workflows supporting LLM-based pipelines.
  • Engineer vector embeddings and document chunking for semantic search and question-answering systems.
  • Collaborate with cross-functional teams to communicate data readiness and integration strategies.

Required Skills:

  • Minimum 5+ years of commercial experience, including 2+ years in a relevant role.
  • Strong proficiency in PySpark and Python, with experience in ML/AI libraries (e.g., Transformers, LangChain).
  • Proven expertise in processing unstructured data and document intelligence (OCR, NLP).
  • Familiarity with vector databases and embedding models for RAG pipelines.
  • Understanding of the LLM lifecycle, including fine-tuning and prompt engineering.
  • Excellent communication skills for interfacing with technical and business stakeholders.

Language Requirements:
Proficiency in English is required.

Location:
Dubai, United Arab Emirates

Work Conditions:
On-site, Full-time.

Apply Direct

Jobs you might like   View all jobs

Ready to apply for this role?

Apply Direct