You will own the reliability and automation of critical systems.
Responsibilities
Architect and manage CI/CD pipelines using GitHub Actions and AWS CodePipeline, automating global infrastructure via Terraform, CloudFormation, or CDK.
Drive cost-optimization, manage auto-scaling thresholds, and execute resilience testing to ensure system durability.
Respond as a primary on-call engineer using ITIL frameworks and ServiceNow; develop Root Cause Analysis (RCA) documentation.
Implement distributed tracing and optimize monitoring using Dynatrace and Kibana to build advanced dashboards.
Define and monitor Service Level Indicators (SLIs) and Objectives (SLOs) while managing error budgets.
Required Skills
4+ years of professional experience in SRE, DevOps, or Infrastructure roles.
Practical, hands-on experience with AWS and Azure platforms.
Mid-level proficiency in Python or similar scripting languages.
Experience with configuration management tools like Ansible.
Solid understanding of Docker and orchestration via Kubernetes or ECS.
Strong knowledge of Linux systems, networking protocols, and Relational/NoSQL database architectures.
Experience implementing CI/CD Pipelines and working with GitHub Actions.
Familiarity with ServiceNow for incident management.