The person will be responsible for as a Technical Authority (SME) for both Azure and Terraform, guide teams on SRE practices, approve production changes.
This role is platform-focused, not application-specific, and requires deep expertise in SRE principles, Azure Landing Zones (Hub-and-Spoke), Terraform, DevOps enablement, monitoring/observability, and incident management.
He should be involved in address long-term reliability and operational risks while building and mentoring SRE teams.
Design, implement, and operate Azure Hub-and-Spoke Landing Zone architectures.
Reduce operational toil through automation and platform improvements.
Own and evangelize SRE principles including availability, reliability, scalability, resilience, and operational maturity.
Define Terraform best practices, state management, drift detection, and CI/CD integration.
Build and maintain CI/CD foundations using GitHub Actions.
Design and standardize monitoring and observability across the Azure platform.
Lead and participate in major incident management following ITIL processes.
Partner with security teams to implement least-privilege access and secure-by-default architectures.
Enforce governance using Azure Policy and standardized platform controls.
Lead, mentor, and grow a high-performing SRE/platform engineering team.
Drive SRE culture across the organization and set technical direction, standards, and operational maturity goals.