You will build and manage large-scale data processing pipelines using distributed computing frameworks in New York City, NY, USA.
Responsibilities
- Develop data processing applications using Python or Scala.
- Implement and maintain data workflows using Spark and PySpark.
- Manage data ingestion and querying through Hive, Impala, and Sqoop.
- Orchestrate data pipelines using Oozie.
Required Skills
- 5+ years of professional experience in big data development.
- Proficiency in Python or Scala programming.
- Hands-on experience with Spark and PySpark.
- Experience with Hive and Impala.
- Experience using Sqoop for data transfer.
- Experience with Oozie for workflow management.
- Degree in any graduate field.