Own the reliability, performance, and operations of distributed microservices systems in production.
Responsibilities
- Design, deploy, and support distributed microservices architectures in live environments.
- Manage Kubernetes clusters including deployment, scaling, upgrades, and troubleshooting.
- Configure and maintain API Gateways (Azure APIM, Kong, or IBM APIC) for routing and observability.
- Diagnose complex production issues using Splunk or AppDynamics and perform root cause analysis.
- Implement SRE best practices, including SLI/SLO monitoring and incident response automation.
Required Skills
- 7+ years of relevant experience, primarily in operations support or administrative engineering.
- Strong expertise in Microservices architecture with hands-on production deployment experience.
- Deep knowledge of Kubernetes cluster operations, scaling, and reliability engineering.
- Proficiency with API Gateway platforms such as Azure API Management, Kong, or IBM API Connect.
- Working proficiency with observability tooling including Splunk, AppDynamics, Instana, or similar.
- Ability to implement log analytics, metrics, traces, dashboards, and SLO-based monitoring.
- Experience with error budgets, incident response, post mortems, and continuous improvement.
Preferred Skills
- Administrative support experience in enterprise IT environments.
- Familiarity with additional API management or container orchestration platforms.