Description
You will build and maintain data pipelines at scale.
Responsibilities
- Design and build data pipelines using big data frameworks.
- Develop producers, consumers, and topics for streaming and messaging systems.
- Implement Change Data Capture (CDC) processes from relational databases.
- Model data and optimize performance within data lakehouse architectures.
- Automate deployments and manage infrastructure using CI/CD practices.
Required Skills
- 5+ years designing and building data pipelines with Apache Spark or Databricks.
- Hands-on experience with Apache Kafka or equivalent messaging systems.
- Proficiency in Python, Scala, or Java, with strong SQL knowledge.
- Deep understanding of relational databases (SQL Server, Oracle) and CDC mechanisms.
- Experience with cloud platforms (Azure or AWS) for data services.
- Knowledge of data Lakehouse architectures and Delta Lake.
- Familiarity with Git and CI/CD pipelines for automation.