Presight is looking for an astute, proficient, and qualified Senior Data Engineer to assess, analyze, and work with data concepts, use-cases, and complex new data sources to provide business insights to customers and support the implementation and integration of the data sources into the Presight AI platform.
Responsibilities
Key responsibilities include:
- Solve challenging problems using Python coding skills.
- Design, build, and launch new data extraction, transformation, and loading processes in production.
- Web crawling, data cleaning, data annotation, data ingestion, and data processing.
- Reading and collating complex data sets.
- Creating and maintaining data pipelines.
- Continuous focus on process improvement to drive efficiency and productivity within the team.
- Utilize Python, SQL, ES, and Shell to build the infrastructure required for optimal data extraction, transformation, and loading.
- Provide insights into key business performance metrics by building analytical tools that utilize the data pipeline.
- Support the wider business with their data needs on an ad hoc basis.
- Comply with QHSE, Business Continuity, Information Security, Privacy, Risk, Compliance Management, and Governance of Organizations policies and related risk assessments.
Requirements
- Bachelor's degree in Computer Engineering, Computer Science, or Electrical Engineering and Computer Sciences.
- 6+ years of programming experience, with solid coding skills in Python, Shell, and Java.
- Experience with web crawling and data cleaning.
- In-depth knowledge in the design and implementation of Spark jobs to execute, schedule, monitor, and control processes.
- Experienced in Spark SQL and Postgres query languages.
- Skilled in writing complex queries with joins for processing large datasets.
- Proficient in using containerization technologies such as Docker.
- Experienced with orchestration tools like Kubernetes.
- Adept at implementing testing and monitoring systems for data pipelines to ensure high availability and reliability.
- Experience with tools like Apache Kafka and Apache Flink for real-time data processing.
- Skilled in using data orchestration tools like Apache Airflow and Apache NiFi.
- Strong understanding of Elasticsearch architecture, queries, and ingestion techniques.
- Experience with solution architecture, data ingestion, query optimization, data segregation, ETL, ELT, AWS services (EC2, S3, SQS, Lambda), Elastic Search, Redshift, CI/CD frameworks, and workflows.
- Working knowledge of data platform concepts, such as data lakes, data warehouses, ETL, big data processing, and real-time processing architecture for data platforms.
- Proficiency in PostgreSQL and programming (preferably Java, Python) with an understanding of data, entity relationships, structured and unstructured data, SQL, and NoSQL databases.
- Knowledge of best practices in optimizing columnar and distributed data processing systems and infrastructure.
- Experienced in designing and implementing dimensional modeling.
- Knowledge of machine learning and data mining techniques in statistical modeling, text mining, and information retrieval.
- Strong analytical and problem-solving skills.
What we look for
We seek performance-driven, inquisitive minds with the agility to adapt to ambiguity. Candidates should be eager to explore opportunities to build meaningful collaborations with stakeholders and aspire to create unique customer-centric solutions with a bias for action and a passion for conquering new frontiers in the AI space.
What working at Presight offers
- Culture: An open, diverse, and inclusive environment with a global vision that encourages personal growth and focuses on groundbreaking, industry-first innovations.
- Career: Outstanding learning, development, and growth opportunities via structured training programs and innovative, high-tech projects.
- Rewards: A competitive remuneration package with a host of perks including healthcare, education support, leave benefits, and more.