Description

You will build and manage data pipelines and Hadoop-based infrastructure.

Responsibilities

  • Install and configure Hadoop clusters alongside tools including Hive, Pig, Sqoop, HBase, and ZooKeeper.
  • Develop ETL processes to load data into HDFS using Sqoop and export results back to RDBMS.
  • Use Pig to perform data transformations, event joins, and pre-aggregations before HDFS sorting.
  • Write MapReduce programs to cleanse data from heterogeneous sources for Hive schema analysis.

Required Skills

  • 5+ years of professional experience.
  • Hadoop ecosystem expertise: Hive, Pig, Sqoop, HBase, and ZooKeeper.
  • Proficiency in ETL development and HDFS management.
  • Experience working with RDBMS and heterogeneous data sources.
  • MapReduce programming for data cleansing and processing.
  • Master's degree in Computer Science, IT, or a related field.

Education

Master's degree