You will drive operational readiness and automate production support toolchains to ensure high availability and resilient application delivery.
Responsibilities
- Automate orchestration and tooling solutions to reduce defects and ensure consistent processes for repetitive tasks.
- Establish logging standards and monitoring requirements to provide business-service level visibility for support teams.
- Build and implement recovery tooling that adheres to enterprise standards and frameworks.
- Partner with development teams to ensure operational readiness, including capacity planning, performance tuning, and proactive alerting.
- Advise software engineers on failure-resistant design patterns and participate in architectural decisions.
Required Skills
- 5+ years of professional experience in an engineering or reliability role.
- Deep understanding of the SDLC to drive requirements and align development with operations.
- Experience acting as a Subject Matter Expert (SME) for IT infrastructure.
- Proven ability to implement automation that minimizes friction for production releases.
- Ability to mentor teams and coach them on SRE functions and standards.
- Knowledge of troubleshooting third-party upstream, network, and file transfer issues.
- Facilitate the resolution of infrastructure, network, storage, and database issues to maintain accurate application data flows.
Preferred Skills
- Experience driving technical platform testing for latency reduction and efficiency.