You will manage incident response and provide advanced support for the AWS platform hosting a high volume of applications.
Responsibilities
- Manage events and incidents as the primary point of contact for application developers via a ticketing system.
- Implement automation to support environment scalability and optimize operational processes for efficiency and security.
- Conduct root cause investigations and document strategies to prevent issue recurrence.
- Train users to self-diagnose and troubleshoot issues for expedited resolution.
- Communicate effectively with stakeholders at various organizational levels.
Required Skills
- 5+ years of experience in public cloud environments, specifically AWS.
- Strong proficiency with core AWS services: S3, EC2, VPC, ECS, and EKS.
- Hands-on experience with Infrastructure as Code using CloudFormation and Terraform.
- Demonstrated expertise in Kubernetes container orchestration.
- Proven ability to optimize operational processes for reliability.
Preferred Skills
- Experience training end-users on self-service troubleshooting techniques.
- Background in high-volume application hosting environments.