Description

You will manage and scale mission-critical global hybrid infrastructure across multiple datacenters and cloud providers. You own system uptime, capacity planning, and product SLAs for large-scale production environments.

Responsibilities

  • Architect and maintain scalable, highly available systems designed to handle high-volume internet traffic.
  • Participate in a weekly 24/7 on-call rotation to resolve outages, debug production issues, and solve escalated tickets.
  • Automate routine tasks and manual processes through scripting and DevOps principles.
  • Develop tools and platforms to improve system observability, insights, and security.

Required Skills

  • 5+ years of experience in a Cloud SRE or similar role.
  • Hands-on experience with cloud providers including AWS, GCP, or OCI.
  • Proficiency with configuration management tools such as Terraform, Ansible, or Puppet.
  • Experience managing containers using Kubernetes and Docker.
  • Strong scripting skills in Python or Golang for task automation.
  • Experience with load balancers such as HAProxy, Nginx, F5, dnsdist, or Varnish.
  • Experience with web servers like Apache or Nginx.
  • Ability to design, develop, and deploy modular cloud-based systems.

Preferred Skills

  • AWS and/or GCP certifications.

Education

Any Graduate