Post a Job

Data Engineer (PySpark)

Unlock employer Dubai, United Arab Emirates Posted: 03 Mar 2025

Apply Direct

Financial

Estimate: $80k - $120k*
Zero income tax location

Accessibility

Hybrid
Apply from abroad
Visa Provided

Requirements

Experience: Senior

Explore similar roles:

View Data Engineer jobs in Dubai · View all Data Engineer jobs

Position

As a Data Engineer (PySpark) at Virtusa, you will be responsible for designing, developing, and maintaining highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform. Your role will involve ensuring data integrity and accuracy, implementing data ingestion processes, and automating data workflows using orchestration tools.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

Responsibilities:

Data Pipeline Development: Design, develop, and maintain ETL pipelines ensuring data integrity and accuracy.
Data Ingestion: Manage data ingestion processes from various sources (relational databases, APIs, file systems) to the data lake or data warehouse on CDP.
Data Transformation and Processing: Use PySpark to process, cleanse, and transform large datasets.
Performance Optimization: Conduct performance tuning of PySpark code and CDP components.
Data Quality and Validation: Implement data quality checks and validation routines.
Automation and Orchestration: Automate data workflows using tools like Apache Oozie or Airflow.
Monitoring and Maintenance: Monitor pipeline performance, troubleshoot issues, and perform routine maintenance.
Collaboration: Work with other data engineers, analysts, and stakeholders to understand data requirements.
Documentation: Maintain thorough documentation of data engineering processes and code.

Qualifications

Education: Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Systems, or a related field.
Experience: 3+ years as a Data Engineer with a strong focus on PySpark and Cloudera Data Platform.

Technical Skills

PySpark: Advanced proficiency, including working with RDDs, DataFrames, and optimization techniques.
Cloudera Data Platform: Experience with Cloudera components such as Cloudera Manager, Hive, Impala.
Data Warehousing: Knowledge of ETL best practices and SQL-based tools.
Big Data Technologies: Familiarity with Hadoop, Kafka, and distributed computing tools.
Orchestration and Scheduling: Experience with Apache Oozie, Airflow, or similar frameworks.
Scripting and Automation: Strong scripting skills in Linux.

Soft Skills