You will manage the installation, configuration, and performance optimization of large-scale Hadoop clusters and big data ecosystems.
Responsibilities
Install, configure, and maintain Hadoop clusters and ecosystem components including HDFS, YARN, MapReduce, Hive, Pig, Spark, HBase, Kafka, and ZooKeeper.
Monitor system performance, identify bottlenecks, and implement optimizations for stability and efficiency.
Perform capacity planning, resource allocation, and scaling to meet growing data demands.
Implement security measures such as user access control, data encryption, and vulnerability management using Kerberos, Ranger, or Sentry.
Develop scripts and automation tools for routine administrative tasks and manage backup and recovery strategies.
Troubleshoot complex infrastructure issues and participate in an on-call rotation for production systems.
Required Skills
5+ years of experience in Hadoop administration managing large-scale clusters (Cloudera, Hortonworks, or Apache distributions).
Proficiency with Linux operating systems including CentOS, Red Hat, and Ubuntu.
Strong scripting skills in Python, Shell (Bash), or Perl for task automation.
Deep knowledge of HDFS architecture, YARN resource management, and Spark in-memory processing.
Experience with distributed coordination via ZooKeeper and streaming via Kafka.
Hands-on experience with monitoring tools like Nagios, Ganglia, Cloudera Manager, or Ambari.
Experience with log management systems such as Splunk or the ELK stack.
Familiarity with version control systems like Git.
Understanding of networking concepts relevant to distributed systems.
Preferred Skills
Cloudera Certified Administrator for Apache Hadoop (CCAH), Hortonworks Certified Apache Hadoop Administrator (HCA), or MapR Certified Hadoop Administrator (MCHA).