You will design, build, and maintain scalable data pipelines to collect, process, and store data from multiple sources.
Responsibilities
Develop and manage ETL/ELT processes to transform data according to schema definitions, apply slicing and dicing, and expose it for downstream consumption.
Automate deployment and CI/CD processes using GitHub workflows to eliminate manual work.
Collaborate with cross-functional teams to capture evolving data requirements and accelerate feature development.
Ensure adherence to data governance policies, privacy regulations, and security protocols across data flows.
Analyze Spark query execution plans to fine-tune queries for optimal processing speed.
Required Skills
7+ years of professional experience in data engineering.
Expertise with Python and PySpark for data processing.
Strong proficiency in Advanced SQL and distributed systems.
Hands-on experience with AWS and Databricks on S3 Storage.
Experience implementing ETL/ELT patterns.
Working knowledge of Spark and Delta Lake.
Familiarity with integrating SFTP for secure data transfers.
Proven problem-solving skills in large-scale distributed environments.