Post a Job

Pyspark Data Engineer

Unlock employer Dubai, United Arab Emirates Posted: 27 Nov 2025

Apply Direct

Financial

Estimate: $80k - $120k*
Zero income tax location

Accessibility

Hybrid
Apply from abroad
Visa Provided

Requirements

Experience: Unspecified
English: Professional

Explore similar roles:

View Data Engineer jobs in Dubai · View all Data Engineer jobs

Position

As a Pyspark Data Engineer, you will be responsible for designing, developing, and maintaining highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform, ensuring data integrity and accuracy.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

Your main tasks will include:

Data Pipeline Development: Create and manage ETL pipelines to support analytical needs and business requirements.
Data Ingestion: Implement data ingestion processes from various sources (e.g., relational databases, APIs) to the data lake or data warehouse.
Data Transformation and Processing: Process and transform large datasets using PySpark.
Performance Optimization: Conduct performance tuning of PySpark code and Cloudera components to optimize resource utilization.
Data Quality and Validation: Ensure data accuracy and reliability through monitoring and validation routines.
Automation and Orchestration: Automate workflows using tools like Apache Oozie or Airflow.
Monitoring and Maintenance: Monitor pipeline performance and perform routine maintenance on the Cloudera Data Platform.
Collaboration: Work closely with data engineers, analysts, and stakeholders to support data-driven initiatives.
Documentation: Maintain thorough documentation of data engineering processes.

Technical Skills:

Advanced proficiency in PySpark, with a solid understanding of RDDs, DataFrames, and optimization techniques.
Strong experience with Cloudera Data Platform components, including Cloudera Manager, Hive, Impala, HDFS, and HBase.
Knowledge of data warehousing concepts and ETL best practices, with experience in SQL-based tools.
Familiarity with big data technologies like Hadoop and Kafka.
Experience with orchestration frameworks such as Apache Oozie or Airflow.
Strong scripting skills in Linux.

Desired Skills and Experience: