Company logo hidden

Pyspark - Data Engineer

Unlock employer Dubai, United Arab Emirates Posted: 27 Nov 2025

Financial

  • Estimate: $60k - $90k*
  • Zero income tax location

Accessibility

  • Hybrid
  • Apply from abroad
  • Visa Provided

Requirements

  • Experience: Unspecified
  • English: Professional

Position

This role involves designing, developing, and maintaining scalable ETL pipelines with PySpark on the Cloudera Data Platform, ensuring data integrity and accuracy. The candidate will be responsible for implementing data ingestion processes from various sources to the data lake or warehouse, using PySpark for data transformation and cleansing to meet analytical and business needs. Performance optimization of PySpark code and CDP components is crucial, alongside conducting data quality checks and automating workflows with orchestration tools like Apache Oozie or Airflow. Collaboration with data engineers, analysts, product managers, and other stakeholders is essential to support data-driven initiatives, along with maintaining comprehensive documentation of engineering processes.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

Responsibilities:

  • Design, develop, and maintain ETL pipelines using PySpark on CDP.
  • Implement and manage data ingestion from various sources to data lakes/warehouses.
  • Process, cleanse, and transform data into formats for analytical use.
  • Conduct performance tuning of PySpark code and optimize resource usage.
  • Implement data quality checks and validation routines.
  • Automate data workflows using orchestration tools.
  • Monitor, troubleshoot, and maintain pipeline performance on CDP.
  • Collaborate with team members to understand data requirements.
  • Maintain documentation of processes and configurations.

Technical Skills:

  • Advanced proficiency in PySpark, including RDDs and optimization techniques.
  • Strong experience with Cloudera Data Platform components like Hive and HDFS.
  • Knowledge of data warehousing concepts and SQL-based tools.
  • Familiarity with Hadoop, Kafka, and distributed computing tools.
  • Experience with orchestration frameworks such as Apache Oozie or Airflow.
  • Strong scripting skills in Linux.

Work Conditions: Hybrid full-time position in Dubai, UAE.

Apply Direct

Jobs you might like   View all jobs

About IT Services and Solutions Company

Company details are hidden. Subscribe to view full company profile.

Ready to apply for this role?

Apply Direct