Virtusa Circular Logo

Data Engineer (PySpark)

Virtusa Dubai, United Arab Emirates Posted: 03 Mar 2025

Financial

  • Estimate: $80k - $120k*
  • Zero income tax location

Accessibility

  • Hybrid
  • Apply from abroad
  • Visa Provided

Requirements

  • Experience: Senior
  • English: Professional

Position

As a Data Engineer (PySpark) at Virtusa, you will be responsible for designing, developing, and maintaining highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform. Your role will involve ensuring data integrity and accuracy, implementing data ingestion processes, and automating data workflows using orchestration tools.

Responsibilities:

  • Data Pipeline Development: Design, develop, and maintain ETL pipelines ensuring data integrity and accuracy.
  • Data Ingestion: Manage data ingestion processes from various sources (relational databases, APIs, file systems) to the data lake or data warehouse on CDP.
  • Data Transformation and Processing: Use PySpark to process, cleanse, and transform large datasets.
  • Performance Optimization: Conduct performance tuning of PySpark code and CDP components.
  • Data Quality and Validation: Implement data quality checks and validation routines.
  • Automation and Orchestration: Automate data workflows using tools like Apache Oozie or Airflow.
  • Monitoring and Maintenance: Monitor pipeline performance, troubleshoot issues, and perform routine maintenance.
  • Collaboration: Work with other data engineers, analysts, and stakeholders to understand data requirements.
  • Documentation: Maintain thorough documentation of data engineering processes and code.

Qualifications

  • Education: Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Systems, or a related field.
  • Experience: 3+ years as a Data Engineer with a strong focus on PySpark and Cloudera Data Platform.

Technical Skills

  • PySpark: Advanced proficiency, including working with RDDs, DataFrames, and optimization techniques.
  • Cloudera Data Platform: Experience with Cloudera components such as Cloudera Manager, Hive, Impala.
  • Data Warehousing: Knowledge of ETL best practices and SQL-based tools.
  • Big Data Technologies: Familiarity with Hadoop, Kafka, and distributed computing tools.
  • Orchestration and Scheduling: Experience with Apache Oozie, Airflow, or similar frameworks.
  • Scripting and Automation: Strong scripting skills in Linux.

Soft Skills

  • Strong analytical and problem-solving abilities.
  • Excellent verbal and written communication skills.
  • Ability to work independently and in a team environment.
  • Attention to detail and commitment to data quality.
Apply now

Jobs you might like   View all jobs

About Virtusa

Virtusa is a global provider of digital strategy, digital engineering, and IT services and solutions. We combine logic, creativity, and curiosity to build, solve, and create innovative solutions for our clients' most pressing business challenges. Our services include consult & design, engineer & automate, and analyze & optimize, across various industries.

Benefits at Virtusa

    • Opportunities for continuous learning and career advancement
    • Flexible work arrangements to accommodate different needs
    • Competitive compensation packages and recognition programs