Description

You will build and run large-scale, massively distributed, fault-tolerant systems in a DevSecOps environment. You work closely with development and operations teams to ensure high availability, cost-effectiveness, and extreme uptime. You participate in a 24/7 follow-the-sun operating model as a first responder for incident and problem management.

Responsibilities

  • Build infrastructure as code patterns using Terraform and cloud SDKs that meet security and engineering standards.
  • Create auto-remediation tools and scripts to establish end-to-end monitoring and alerting for critical system aspects.
  • Troubleshoot and resolve trouble tickets by collaborating with cloud operations teams.
  • Automate and orchestrate Linux/Windows systems and containers to eliminate toil.
  • Ensure functional and performance objectives through rigorous monitoring of infrastructure and application uptime.

Required Skills

  • 5+ years of experience in public cloud development or administration.
  • 2+ years of experience with monitoring infrastructure and application availability.
  • Proficiency in Python, Bash, Java, Go, JavaScript, and/or Node.js.
  • Experience with system administration, including automation and orchestration of Linux/Windows.
  • Hands-on experience with containers (Docker, Kubernetes) and Infrastructure as Code (Terraform, Chef, Ansible).
  • 2+ years of experience with CI/CD tooling and practices.
  • Cross-functional knowledge of systems, storage, networking, security, and databases.
  • BS degree in Computer Science, Physics, Mathematics, or equivalent job experience.

Education

Any Graduate