Key Responsibilities:
• Manage day-to-day operations of enterprise Kubernetes platform (100+ clusters)
• Perform cluster maintenance, upgrades, patching, and health checks
• Troubleshoot platform issues impacting application availability
• Participate in incident response, RCA, and post-incident reviews
• Support on-call rotations and provide after-hours support
• Assist application teams with Kubernetes-related issues and best practices
• Work with enterprise tools (Ingress, logging, monitoring, container registries)
• Document runbooks, procedures, and troubleshooting guides
Required Skills & Experience:
• Strong hands-on Kubernetes experience in Production environments
• Deep understanding of:
Pods, Deployments, StatefulSets
Services, Ingress
Namespaces, RBAC, ConfigMaps, Secrets
• Experience troubleshooting networking, scheduling, and resource issues
• Experience with enterprise Kubernetes platforms (Tanzu, OpenShift, AKS, EKS, etc.)
• Strong Linux administration and troubleshooting skills
• Experience working in regulated, production environments
• Ability to work independently and handle critical issues
⭐ Nice to Have:
• Experience with VMware Tanzu Kubernetes
• Scripting skills (Bash, Python) for automation
• Experience managing large multi-cluster environments
• Exposure to tools like:
Contour / Envoy (Ingress)
Fluent Bit (Logging)
Dynatrace (Monitoring)
Harbor / JFrog (Container Registries)
Bachelor's degree