Description
You will build and manage scalable data pipelines for batch processing large datasets at the terabyte scale.
Responsibilities
- Develop and manage scalable data pipelines for batch processing of large datasets.
- Implement cloud infrastructure using Terraform and AWS services including S3, EMR, SNS, SQS, and Redshift.
- Build and maintain CI/CD and DataOps pipelines using Python.
- Utilize Spark or PySpark for big data processing.
- Support infrastructure with Kubernetes and contribute to Java-based systems.
Required Skills
- 5+ years of experience in data engineering.
- Strong proficiency in Python and Spark or PySpark.
- Hands-on experience with AWS, specifically S3 and EMR.
- Proficiency with Terraform for infrastructure management.
- Experience with DevOps or DataOps, including CI/CD pipeline development.
- Experience with Java and large-scale batch data processing.
- Bachelor's degree or equivalent graduate level education.
- Ability to work onsite 5 days a week in Wilmington, DE.
Preferred Skills
- Experience with Snowflake, SQL, or Redshift.
- Familiarity with SNS, SQS, and Kubernetes.