Description
You will deliver 24x7x365 remote support for customer Compute and Network environments in Bangalore. You own incident resolution, focusing on root cause analysis, prevention, and ticket elimination.
Responsibilities
- Execute operating system and runtime patching and hardening for RedHat Linux, Kubernetes, AKS, and ESXi environments.
- Apply patching and hardening to networking equipment including Palo Alto Firewalls, F5 Loadbalancers, Arista Switches, and Web Application Firewalls.
- Manage Business Continuity and Disaster Recovery activities, including quarterly site-to-site failover validation and tabletop exercises.
- Handle incident fault isolation, repair, and generate root cause analysis summaries.
- Produce compliance documentation, including PCI data creation, request collection, and delivery.
Required Skills
- 5+ years of experience in Site Reliability Engineering or Network Operations.
- Hands-on expertise with Linux, Puppet, and Terraform.
- Strong knowledge of Kubernetes, Azure Kubernetes Service (AKS), and Azure Container Registry.
- Experience with Azure services: Key Vault, Storage Account, Event Hub, Cosmos DB, Elastic Search, and MySQL/Azure SQL.
- Proficiency in managing Palo Alto Firewalls and F5 Loadbalancers.
- Ability to troubleshoot complex network and compute issues in a high-availability environment.
Preferred Skills
- Experience with PagerDuty, ICM, Grafana, and Prometheus.
- Proficiency in Python and NodeJS for automation.
- Familiarity with GitLab for CI/CD workflows.