Description
You will lead the SRE function and build the team from the ground up to ensure system reliability and stability.
Responsibilities
- Build and mentor an SRE team to drive operational excellence.
- Define and manage KPIs, SLOs, and SLIs in collaboration with stakeholders.
- Own all service management elements, including production monitoring and proactive alerting.
- Automate toil and implement system self-healing mechanisms to increase efficiency.
- Manage environment provisioning via Terraform and oversee the automated code deployment and release framework.
Required Skills
- 15+ years of experience in technical leadership or service management roles.
- Hands-on experience with Java.
- Expertise in CI/CD pipelines and automation frameworks.
- Deep understanding of SRE principles, including SLOs and SLIs.
- Experience with Terraform for environment management.
- Proficiency in production monitoring and predictive alerting.
- Ability to manage release engineering, including feature flag management and automated deployments.
- Experience working within the SAFe framework.
- Strong stakeholder management skills.
Preferred Skills
- Experience in setting up new SRE teams and establishing operational processes.