You will design and implement data processing workflows within an AWS environment.
Responsibilities
- Build and maintain ETL pipelines using AWS Glue and PySpark.
- Manage data warehousing tasks within Amazon Redshift.
- Develop and orchestrate workflows using Oozie and Hive.
- Provision and manage cloud infrastructure through CloudFormation.
- Optimize data storage and retrieval using DynamoDB and EMR.
Required Skills
- 5+ years of experience in data engineering.
- Expert-level proficiency in Python and PySpark.
- Hands-on experience with AWS Glue ETL.
- Experience managing Amazon Redshift clusters.
- Proficiency with Hive and Oozie orchestration.
- Experience with AWS CloudFormation for infrastructure as code.
- Working knowledge of Amazon EMR and DynamoDB.
- Bachelor's degree or equivalent experience.