Description

You will build and maintain data pipelines and storage solutions within an AWS environment.

Responsibilities

  • Extract data from multiple sources to load into Data Lakes and AWS Redshift.
  • Develop ETL processes using AWS Glue and Spark/PySpark.
  • Design and implement data models using S3 and DynamoDB.
  • Apply DevOps principles by maintaining CI/CD pipelines in GitLab.
  • Perform requirements analysis, data analysis, and integration testing.

Required Skills

  • 3+ years of experience in Data Engineering or software development.
  • 2+ years of hands-on development with AWS Cloud solutions.
  • 2+ years of experience with Spark or PySpark.
  • 2+ years of experience developing AWS Glue ETL.
  • 2+ years of experience with AWS storage models including S3 and DynamoDB.
  • Experience with AWS EMR, Hudi, and Athena.
  • Experience with AWS services including Lambda, SNS/SQS, EventBridge, and LakeFormation.
  • Working knowledge of on-prem ETL tooling such as Ab Initio or Informatica.
  • Experience implementing CI/CD pipelines using GitLab.

Preferred Skills

  • Experience with AWS Elasticsearch, RDS, or PostgreSQL.
  • Experience with Java services.

Education

ANY GRADUATE