You will build and manage data pipelines and Hadoop-based infrastructure.
Responsibilities
- Install and configure Hadoop clusters alongside tools including Hive, Pig, Sqoop, HBase, and ZooKeeper.
- Develop ETL processes to load data into HDFS using Sqoop and export results back to RDBMS.
- Use Pig to perform data transformations, event joins, and pre-aggregations before HDFS sorting.
- Write MapReduce programs to cleanse data from heterogeneous sources for Hive schema analysis.
Required Skills
- 5+ years of professional experience.
- Hadoop ecosystem expertise: Hive, Pig, Sqoop, HBase, and ZooKeeper.
- Proficiency in ETL development and HDFS management.
- Experience working with RDBMS and heterogeneous data sources.
- MapReduce programming for data cleansing and processing.
- Master's degree in Computer Science, IT, or a related field.