Description

You will build and manage large-scale data processing pipelines using distributed computing frameworks in New York City, NY, USA.

Responsibilities

  • Develop data processing applications using Python or Scala.
  • Implement and maintain data workflows using Spark and PySpark.
  • Manage data ingestion and querying through Hive, Impala, and Sqoop.
  • Orchestrate data pipelines using Oozie.

Required Skills

  • 5+ years of professional experience in big data development.
  • Proficiency in Python or Scala programming.
  • Hands-on experience with Spark and PySpark.
  • Experience with Hive and Impala.
  • Experience using Sqoop for data transfer.
  • Experience with Oozie for workflow management.
  • Degree in any graduate field.

Education

Any Graduate