Description

You will drive operational readiness and automate production support toolchains to ensure high availability and resilient application delivery.

Responsibilities

  • Automate orchestration and tooling solutions to reduce defects and ensure consistent processes for repetitive tasks.
  • Establish logging standards and monitoring requirements to provide business-service level visibility for support teams.
  • Build and implement recovery tooling that adheres to enterprise standards and frameworks.
  • Partner with development teams to ensure operational readiness, including capacity planning, performance tuning, and proactive alerting.
  • Advise software engineers on failure-resistant design patterns and participate in architectural decisions.

Required Skills

  • 5+ years of professional experience in an engineering or reliability role.
  • Deep understanding of the SDLC to drive requirements and align development with operations.
  • Experience acting as a Subject Matter Expert (SME) for IT infrastructure.
  • Proven ability to implement automation that minimizes friction for production releases.
  • Ability to mentor teams and coach them on SRE functions and standards.
  • Knowledge of troubleshooting third-party upstream, network, and file transfer issues.
  • Facilitate the resolution of infrastructure, network, storage, and database issues to maintain accurate application data flows.

Preferred Skills

  • Experience driving technical platform testing for latency reduction and efficiency.

Key Skills
Education

Any Graduate