Site Reliability Engineer

Optimhire Software Solutions Private Limited
San Jose, CA, USA

Description

You will manage and scale mission-critical global hybrid infrastructure across multiple datacenters and cloud providers. You own system uptime, capacity planning, and product SLAs for large-scale production environments.

Responsibilities

Architect and maintain scalable, highly available systems designed to handle high-volume internet traffic.
Participate in a weekly 24/7 on-call rotation to resolve outages, debug production issues, and solve escalated tickets.
Automate routine tasks and manual processes through scripting and DevOps principles.
Develop tools and platforms to improve system observability, insights, and security.

Required Skills

5+ years of experience in a Cloud SRE or similar role.
Hands-on experience with cloud providers including AWS, GCP, or OCI.
Proficiency with configuration management tools such as Terraform, Ansible, or Puppet.
Experience managing containers using Kubernetes and Docker.
Strong scripting skills in Python or Golang for task automation.
Experience with load balancers such as HAProxy, Nginx, F5, dnsdist, or Varnish.
Experience with web servers like Apache or Nginx.
Ability to design, develop, and deploy modular cloud-based systems.

Preferred Skills

AWS and/or GCP certifications.

Key Skills

Ansible Terraform Python Docker Azure Aws Gcp Kubernetes

Education

Any Graduate

Apply Now

Back To Jobs

Posted On: Today
Experience: 5+ years of experience
Availability: On Site
Openings: 1
Category: Site Reliability Engineer
Tenure: Contract - Corp-to-Corp Position