About the job:
We are seeking a highly skilled and motivated Senior Data Engineer to join our dynamic team. This role is integral to our mission of designing, building, and maintaining scalable databases, Data warehouse, and data pipelines to support production analytical workloads, Machine learning workloads, and internal business reporting. As a Senior Data Engineer, you will ensure the accuracy and reliability of our data infrastructure while collaborating closely with data scientists to deploy and scale machine learning models.
Responsibilities:
- Data Pipelines: Design, build, and maintain scalable databases and data pipelines for production analytical workloads and internal business reporting, ensuring accuracy and understanding of both technical and business contexts.
- ClickHouse Management: Manage ClickHouse instances to ensure high performance, cost efficiency, and scalability.
- CDC Management: Oversee change data capture feeds from production databases, ensuring smooth ingestion into downstream pipelines.
- Data Consistency: Develop and optimize processes to monitor data consistency and accuracy.
- Schema Migrations: Handle schema migrations from source databases feeding data through CDC and reflect them if needed on any existing analytical pipeline. And implement processes to ensure efficient schema migrations in the data warehouse.
- Technical Support and Communication: Provide clear documentation, tools, and support to help the engineering team with building and using analytical pipelines.
- Collaboration: Work closely with data scientists to deploy and scale machine learning models.
- Performance Tuning: Optimize application and query performance using profiling tools.
- Problem Solving: Identify and address root causes of issues, considering the broader context.
Qualifications:
- Bachelor's degree in Computer Science, Software Engineering, or a related field, or equivalent experience.
- Expertise in relational databases like PostgreSQL, with a focus on performance optimization.
- Experience with ClickHouse or other OLAP databases.
- 3+ years of data engineering experience.
- Ability to work in a fast-paced startup environment with constantly evolving requirements.
- Strong skills in data modeling, warehousing, and building ETL pipelines.
- Knowledge of batch and streaming data architectures such as Kafka and RedPandas.
- Proficiency in Python or other programming languages such as Go, Java, or Scala.
- Familiarity with big data tools and frameworks like Spark.
- Experience integrating machine learning models into data pipelines.