Summary
Results-driven Data Engineer with over two years of professional experience, including nearly two years specializing in big data technologies. Proven expertise in architecting and implementing scalable data pipelines, managing large-scale data migrations, and optimizing distributed systems using tools like Apache Spark, Iceberg, Kafka, and cloud platforms like AWS and Azure. Adept at ensuring data quality, integrity, and performance in complex data lakehouse and warehouse environments. Backed by a strong foundation in software engineering, with hands-on experience building robust backend services, APIs, and real-time data platforms. Skilled in Python, Java, and C++, with a deep understanding of system design patterns and data-driven application development.
Experience
Data Engineer
- Migrated hundreds of full loads and historical tables with Terabytes of data from Oracle data warehouse to data lakehouse using a robust in-house developed PySpark framework with the help of Airflow.
- Developed a PySpark framework to migrate hundreds of tables containing billions of records from an old data warehouse to a data lakehouse environment.
- Created comprehensive ETL documentation for numerous tables to ensure streamlined data processing and future scalability.
- Conducted data profiling and implemented data quality checks to ensure integrity and consistency throughout the migration process.
- Tools: Pyspark, SQL, Excel, Oracle DB, Airflow, Apache Iceberg, MinIO, and Hive
Software Engineer
- Developed backend multi-dashboard APIs for forecasting and presenting passenger traffic, efficiently processing data volumes reaching hundreds of terabytes.
- Engineered a name nationality prediction backend service.
- Tools: Fast API, PostgreSQL
Data Engineer
- Designed and implemented an AI-powered SQL query platform (PrismSQL).
- Worked on multiple R&D projects, enhancing the DigiXT product.
- Engineered a new data-loading service using Spring Boot for integration with Iceberg tables.
- Optimized Spark workflows and maintained Iceberg tables for efficient and significant data operations.
- Tools/Technologies: Spark, Iceberg, Kafka, Airflow, MinIO, Trino, Nifi, Superset, PostgreSQL, MySQL, Azure (ADLS2), and AWS (S3, Glue).
Certifications
- AWS Certified: AWS Solution Architect - Associate
- Microsoft Certified: Azure Data Engineer - Associate
- Databricks Certified: Associate Developer for Apache Spark
- CCNA: Switching, Routing, and Wireless Essentials
Technical Skills
- Programming: Python, Java, C/C++, SQL, PHP, JavaScript
- Big Data Tools: Apache Spark, Superset, Kafka, Airflow, Nifi, Trino, Apache Iceberg
- Cloud Platforms: AWS(Redshift, Lambda, Glue, EC2, S3, IAM, EMR, etc.), Azure(Data Factory, Synapse, Data lake, Microsoft Purview, Azure Databricks, etc.)
- AI & ML: RAG, Milvus DB, Indexing.
- Cybersecurity: Network Security, Data Integrity, System Troubleshooting
- Data Operations: ETL/ELT, Stream Processing, Distributed Computing, Data warehousing, Data Modeling, Data Profiling, and Quality
- Version Control: Git/GitHub