You will own the reliability, performance, and security posture of critical production services.
Responsibilities
- Maintain and improve application performance, availability, and security.
- Evaluate tools in SRE areas to drive application performance improvements.
- Design, plan, and implement highly reliable solutions, including troubleshooting and capacity planning.
- Collaborate with Cyber Security to remediate vulnerabilities in web and middleware infrastructure using automated code reviews.
- Participate in a 24/7 on-call rotation supporting critical production services.
Required Skills
- 10+ years managing large scale websites and application deployment processes/CICD.
- 5+ years improving performance for OTT and Mobile applications.
- 5+ years supporting services, micro-services, and n-tier systems.
- 5+ years using software automation technologies and 2+ years with Infrastructure as Code.
- 5+ years extensive experience with Amazon Web Services and Google Cloud Platform.
- 5+ years with application and container instrumentation using APM tools (Datadog, New Relic, Sysdig, App Dynamics, Zabbix).
- 5+ years with scripting languages such as Bash, Python, Perl, Groovy.
- 5+ years with Configuration Management tools such as Ansible, Salt, Chef, Puppet.
- 3+ Years working with Code and Infrastructure Security, and implementing/monitoring security policies (WAF, Site Shield).
Preferred Skills