Company logo hidden

Pyspark Data Engineer

Unlock employer Dubai, United Arab Emirates Posted: 27 Nov 2025

Financial

  • Estimate: $80k - $120k*
  • Zero income tax location

Accessibility

  • Hybrid
  • Apply from abroad
  • Visa Provided

Requirements

  • Experience: Unspecified
  • English: Professional

Position

As a Pyspark Data Engineer, you will be responsible for designing, developing, and maintaining highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform, ensuring data integrity and accuracy.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

Your main tasks will include:

  • Data Pipeline Development: Create and manage ETL pipelines to support analytical needs and business requirements.
  • Data Ingestion: Implement data ingestion processes from various sources (e.g., relational databases, APIs) to the data lake or data warehouse.
  • Data Transformation and Processing: Process and transform large datasets using PySpark.
  • Performance Optimization: Conduct performance tuning of PySpark code and Cloudera components to optimize resource utilization.
  • Data Quality and Validation: Ensure data accuracy and reliability through monitoring and validation routines.
  • Automation and Orchestration: Automate workflows using tools like Apache Oozie or Airflow.
  • Monitoring and Maintenance: Monitor pipeline performance and perform routine maintenance on the Cloudera Data Platform.
  • Collaboration: Work closely with data engineers, analysts, and stakeholders to support data-driven initiatives.
  • Documentation: Maintain thorough documentation of data engineering processes.

Technical Skills:

  • Advanced proficiency in PySpark, with a solid understanding of RDDs, DataFrames, and optimization techniques.
  • Strong experience with Cloudera Data Platform components, including Cloudera Manager, Hive, Impala, HDFS, and HBase.
  • Knowledge of data warehousing concepts and ETL best practices, with experience in SQL-based tools.
  • Familiarity with big data technologies like Hadoop and Kafka.
  • Experience with orchestration frameworks such as Apache Oozie or Airflow.
  • Strong scripting skills in Linux.

Desired Skills and Experience:

  • Experience with AWS Native Data Services (optional).

Work Conditions:

  • Hybrid work environment.
  • Full-time position.

Language Requirements:

  • (No specific language requirements mentioned in the job posting.)
Apply Direct

Jobs you might like   View all jobs

About IT Services and Solutions Company

Company details are hidden. Subscribe to view full company profile.

Ready to apply for this role?

Apply Direct