Description
You will lead the reliability, scalability, and efficiency of the Intelligent Interactions platform and its core workflow engine.
Responsibilities
- Architect and implement solutions to ensure platform availability and stability within Kubernetes environments.
- Develop CI/CD pipelines, automation scripts, and infrastructure as code to streamline deployments.
- Implement monitoring, logging, and alerting tools to proactively address system issues and bottlenecks.
- Lead incident response and post-mortem processes to drive continuous improvements in reliability.
- Collaborate with engineering, QA, and product teams to optimize software for available infrastructure.
Required Skills
- 5+ years of experience in Site Reliability Engineering, DevOps, or similar roles across at least two different companies.
- Hands-on expertise with Kubernetes and Linux systems.
- In-depth experience with AWS services.
- Proficiency with Terraform and Ansible for infrastructure management.
- Experience with Jenkins for CI/CD pipeline development.
- Strong scripting and programming skills in Python and Bash.
- Working knowledge of Java and Kafka.
Preferred Skills
- Ability to support developers in optimizing engineering services for infrastructure.