Description

Own the reliability, performance, and operations of distributed microservices systems in production.

Responsibilities

  • Design, deploy, and support distributed microservices architectures in live environments.
  • Manage Kubernetes clusters including deployment, scaling, upgrades, and troubleshooting.
  • Configure and maintain API Gateways (Azure APIM, Kong, or IBM APIC) for routing and observability.
  • Diagnose complex production issues using Splunk or AppDynamics and perform root cause analysis.
  • Implement SRE best practices, including SLI/SLO monitoring and incident response automation.

Required Skills

  • 7+ years of relevant experience, primarily in operations support or administrative engineering.
  • Strong expertise in Microservices architecture with hands-on production deployment experience.
  • Deep knowledge of Kubernetes cluster operations, scaling, and reliability engineering.
  • Proficiency with API Gateway platforms such as Azure API Management, Kong, or IBM API Connect.
  • Working proficiency with observability tooling including Splunk, AppDynamics, Instana, or similar.
  • Ability to implement log analytics, metrics, traces, dashboards, and SLO-based monitoring.
  • Experience with error budgets, incident response, post mortems, and continuous improvement.

Preferred Skills

  • Administrative support experience in enterprise IT environments.
  • Familiarity with additional API management or container orchestration platforms.

Education

Any Graduate