Description

You will build and migrate complex ETL pipelines to support elastic data systems.

Responsibilities

  • Extract and combine data from various heterogeneous sources.
  • Build and migrate complex ETL pipelines to S3, Redshift, or EMR.
  • Manipulate, process, and extract value from large datasets.
  • Own the full project life cycle including analysis, design, development, testing, and release.

Required Skills

  • 5+ years of hands-on experience with AWS services including S3, Glue ETL, Glue Catalog, Athena, and EMR.
  • Proficiency with PySpark and Spectrum.
  • 2-3 years of experience with Hadoop Ecosystem components such as HDFS, Hive, Spark, Sqoop, MapReduce, and YARN.
  • Strong experience with ETL tools like Talend and Amazon Glue.
  • Deep SQL coding experience.
  • Hands-on experience with Python functional programming.
  • Experience with databases including MS SQL Server, Oracle, and MySQL.
  • Proven ability in data modeling and data warehousing.
  • Experience with AWS Data Warehousing platforms such as Redshift, Athena, and Aurora.

Preferred Skills

  • Experience working with Life Sciences or Pharma data.

Education

Any Graduate