Description
You will design, develop, and maintain ETL pipelines and manage data solutions.
Responsibilities
- Design and develop ETL pipelines using PySpark, Python, and SQL.
- Implement and manage data solutions leveraging AWS services (S3, Glue, EMR, Redshift, Lambda).
- Collaborate with data scientists and analysts to deliver reliable datasets.
- Optimize data workflows for performance, scalability, and cost efficiency.
- Monitor, troubleshoot, and improve existing data pipelines and infrastructure.
Required Skills
- 10+ years of experience as a Data Engineer.
- Strong proficiency in Python and PySpark for data processing.
- Advanced knowledge of SQL for querying and data modeling.
- Hands-on experience with AWS services (S3, Glue, EMR, Redshift, Lambda).
- Experience with workflow orchestration tools (Airflow, Step Functions).
- Solid understanding of distributed computing and data lake/warehouse concepts.
- Experience with CI/CD pipelines (GitHub Actions, CodePipeline, Jenkins).
- Proficiency with Docker and Kubernetes.