Description

You will manage incident response and provide advanced support for the AWS platform hosting a high volume of applications.

Responsibilities

  • Manage events and incidents as the primary point of contact for application developers via a ticketing system.
  • Implement automation to support environment scalability and optimize operational processes for efficiency and security.
  • Conduct root cause investigations and document strategies to prevent issue recurrence.
  • Train users to self-diagnose and troubleshoot issues for expedited resolution.
  • Communicate effectively with stakeholders at various organizational levels.

Required Skills

  • 5+ years of experience in public cloud environments, specifically AWS.
  • Strong proficiency with core AWS services: S3, EC2, VPC, ECS, and EKS.
  • Hands-on experience with Infrastructure as Code using CloudFormation and Terraform.
  • Demonstrated expertise in Kubernetes container orchestration.
  • Proven ability to optimize operational processes for reliability.

Preferred Skills

  • Experience training end-users on self-service troubleshooting techniques.
  • Background in high-volume application hosting environments.

Education

Any Gradute