Description
You will manage production operations and infrastructure for a large-scale Hadoop environment supporting over 100 tenant applications.
Responsibilities
- Manage incident management, observability, and root cause analysis for the Strategic Data Platform.
- Configure alerting mechanisms and observability tools to proactively identify performance issues.
- Act as a technical bridge between L1, L2, and L3 support teams.
- Onboard new applications onto existing CD pipelines using Celestial, Tower, and XLR.
- Automate repetitive DevOps tasks using Shell and Python scripting.
Required Skills
- 5+ years of experience in Ansible automation and DevOps engineering.
- Hands-on expertise with Hadoop architecture and ecosystem components including HDFS, YARN, and MapReduce.
- Experience with cluster management tools such as Ambari, Cloudera Manager, or Pepper Data.
- Proficiency in Linux system administration, networking, and troubleshooting.
- Strong knowledge of XLR, Git, and Artifactory.
- 2+ years of experience using CI/CD pipelines to deliver Infrastructure as Code.
- Experience with CI/CD toolchains and continuous deployment automation.
- Advanced Shell and Python programming skills.
- Strong Unix/Linux operating system knowledge.
Preferred Skills
- Experience with product release cycles and release management.
- Background in the financial services industry.
- Bachelor’s degree in Computer Science or a related field.