← Back to jobs
Atlanta, GA, USA
No related jobs found
Key Responsibilities:
✅ Architect highly available, scalable, and secure AWS-based solutions
✅ Define SRE standards, SLIs, SLOs, Error Budgets, and reliability frameworks
✅ Assess and improve enterprise observability maturity
✅ Design automation strategies to reduce operational overhead
✅ Evaluate and enhance CI/CD, IaC, monitoring, and remediation frameworks
✅ Lead production readiness reviews and architectural assessments
✅ Implement resilience patterns including circuit breaking, rate limiting, and graceful degradation
✅ Lead blameless postmortems and drive systemic improvements
✅ Mentor engineering and SRE teams on best practices
🔹 Required Qualifications:
✔️ Proven experience in SRE Architecture and Reliability Engineering
✔️ Deep expertise in AWS cloud infrastructure and distributed systems
✔️ Hands-on experience with Kubernetes, Docker, and serverless technologies
✔️ Strong observability experience with Dynatrace, Prometheus, Grafana, ELK/EFK, Jaeger, OpenTelemetry
✔️ Programming/Scripting skills in Python, Go, or Bash
✔️ Strong automation, analytical, and leadership skills
Bachelor's degree
No related jobs found
← Back to jobs