You will build and maintain data pipelines and storage solutions within an AWS environment.
Responsibilities
- Extract data from multiple sources to load into Data Lakes and AWS Redshift.
- Develop ETL processes using AWS Glue and Spark/PySpark.
- Design and implement data models using S3 and DynamoDB.
- Apply DevOps principles by maintaining CI/CD pipelines in GitLab.
- Perform requirements analysis, data analysis, and integration testing.
Required Skills
- 3+ years of experience in Data Engineering or software development.
- 2+ years of hands-on development with AWS Cloud solutions.
- 2+ years of experience with Spark or PySpark.
- 2+ years of experience developing AWS Glue ETL.
- 2+ years of experience with AWS storage models including S3 and DynamoDB.
- Experience with AWS EMR, Hudi, and Athena.
- Experience with AWS services including Lambda, SNS/SQS, EventBridge, and LakeFormation.
- Working knowledge of on-prem ETL tooling such as Ab Initio or Informatica.
- Experience implementing CI/CD pipelines using GitLab.
Preferred Skills
- Experience with AWS Elasticsearch, RDS, or PostgreSQL.
- Experience with Java services.