Company logo hidden

Data Engineer (PySpark)

Unlock employer Dubai, United Arab Emirates Posted: 03 Mar 2025

Financial

  • Estimate: $80k - $120k*
  • Zero income tax location

Accessibility

  • Hybrid
  • Apply from abroad
  • Visa Provided

Requirements

  • Experience: Senior

Position

As a Data Engineer (PySpark) at Virtusa, you will be responsible for designing, developing, and maintaining highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform. Your role will involve ensuring data integrity and accuracy, implementing data ingestion processes, and automating data workflows using orchestration tools.

Ready to apply for roles like this?

Unlock the company name and direct application link. Subscribers get instant access to fresh jobs across Dubai, Abu Dhabi and Riyadh, many with visa support.

Unlock employer & apply directly

Responsibilities:

  • Data Pipeline Development: Design, develop, and maintain ETL pipelines ensuring data integrity and accuracy.
  • Data Ingestion: Manage data ingestion processes from various sources (relational databases, APIs, file systems) to the data lake or data warehouse on CDP.
  • Data Transformation and Processing: Use PySpark to process, cleanse, and transform large datasets.
  • Performance Optimization: Conduct performance tuning of PySpark code and CDP components.
  • Data Quality and Validation: Implement data quality checks and validation routines.
  • Automation and Orchestration: Automate data workflows using tools like Apache Oozie or Airflow.
  • Monitoring and Maintenance: Monitor pipeline performance, troubleshoot issues, and perform routine maintenance.
  • Collaboration: Work with other data engineers, analysts, and stakeholders to understand data requirements.
  • Documentation: Maintain thorough documentation of data engineering processes and code.

Qualifications

  • Education: Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Systems, or a related field.
  • Experience: 3+ years as a Data Engineer with a strong focus on PySpark and Cloudera Data Platform.

Technical Skills

  • PySpark: Advanced proficiency, including working with RDDs, DataFrames, and optimization techniques.
  • Cloudera Data Platform: Experience with Cloudera components such as Cloudera Manager, Hive, Impala.
  • Data Warehousing: Knowledge of ETL best practices and SQL-based tools.
  • Big Data Technologies: Familiarity with Hadoop, Kafka, and distributed computing tools.
  • Orchestration and Scheduling: Experience with Apache Oozie, Airflow, or similar frameworks.
  • Scripting and Automation: Strong scripting skills in Linux.

Soft Skills

  • Strong analytical and problem-solving abilities.
  • Excellent verbal and written communication skills.
  • Ability to work independently and in a team environment.
  • Attention to detail and commitment to data quality.
Apply Direct

Jobs you might like   View all jobs

About IT Services and Solutions Company

Company details are hidden. Subscribe to view full company profile.

Ready to apply for this role?

Apply Direct