Description
You will build scalable data pipelines and support the migration of data platforms to a Common Data Platform (CDP).
Responsibilities
- Design and build ETL/ELT pipelines using PySpark and SQL.
- Manage and optimize data within dimension and fact tables across Bronze, Silver, and Gold layers.
- Lead migrations from legacy platforms to CDP using Talend or Informatica.
- Maintain E-R models, perform data cataloging, and replicate existing ETL transformations.
- Set up and maintain CI/CD pipelines using GitHub Actions.
Required Skills
- 5+ years of experience in data engineering.
- Expertise in PySpark and SQL for data processing.
- Strong experience with ETL tools and data warehousing concepts.
- Hands-on experience with AWS cloud services.
- Proficiency with GitHub and CI/CD workflows.
- Experience with Medalion Architecture principles.
- Ability to support both on-prem and cloud environments.
Preferred Skills
- Experience with Terraform for Infrastructure as Code.
- Experience with Airflow for workflow orchestration.