You will build and migrate complex ETL pipelines to support elastic data systems.
Responsibilities
- Extract and combine data from various heterogeneous sources.
- Build and migrate complex ETL pipelines to S3, Redshift, or EMR.
- Manipulate, process, and extract value from large datasets.
- Own the full project life cycle including analysis, design, development, testing, and release.
Required Skills
- 5+ years of hands-on experience with AWS services including S3, Glue ETL, Glue Catalog, Athena, and EMR.
- Proficiency with PySpark and Spectrum.
- 2-3 years of experience with Hadoop Ecosystem components such as HDFS, Hive, Spark, Sqoop, MapReduce, and YARN.
- Strong experience with ETL tools like Talend and Amazon Glue.
- Deep SQL coding experience.
- Hands-on experience with Python functional programming.
- Experience with databases including MS SQL Server, Oracle, and MySQL.
- Proven ability in data modeling and data warehousing.
- Experience with AWS Data Warehousing platforms such as Redshift, Athena, and Aurora.
Preferred Skills
- Experience working with Life Sciences or Pharma data.