You will manage and optimize large-scale Hadoop clusters and the surrounding big data ecosystem.
Responsibilities
Install, configure, and maintain Hadoop clusters and ecosystem components including HDFS, YARN, MapReduce, Hive, Pig, Spark, HBase, Kafka, and ZooKeeper.
Monitor system performance, identify bottlenecks, and implement optimizations to ensure stability and efficiency.
Perform capacity planning, resource allocation, and scaling to meet increasing data demands.
Implement security measures such as user access control, data encryption, and vulnerability management using Kerberos, Ranger, or Sentry.
Develop scripts and automation tools for routine administrative tasks and manage backup and recovery strategies.
Troubleshoot complex infrastructure issues and participate in an on-call rotation for production systems.
Required Skills
5+ years of experience in Hadoop administration using Cloudera, Hortonworks, or Apache distributions.
Deep expertise in HDFS architecture, YARN resource management, and MapReduce frameworks.
Hands-on experience with Hive, Spark, HBase, Kafka, and ZooKeeper.
Strong proficiency in Linux operating systems such as CentOS, Red Hat, or Ubuntu.
Proficiency in scripting languages including Shell (Bash), Python, or Perl for task automation.
Experience with security implementations including Kerberos, Ranger, and Sentry.
Competency with monitoring tools like Nagios, Ganglia, Cloudera Manager, or Ambari.
Experience with log management systems such as Splunk or the ELK stack.
Solid understanding of networking concepts relevant to distributed systems and Git for version control.
Preferred Skills
Experience with Azure cloud services, Azure Data Factory, Databricks, or Snowflake.
Knowledge of Docker, Kubernetes, or NoSQL databases like MongoDB and Cassandra.