Senior SRE Engineer

You will own the implementation and maintenance of Site Reliability Engineering practices for our web portal.

Design and implement comprehensive SRE monitoring for the web portal on GCP.
Implement logging and tracing standards across all portal components using Cloud Logging and Cloud Trace.
Configure APIGEE monitoring and track API performance for portal services.
Develop and maintain SRE automation scripts within GKE namespaces for monitoring, deployment, and troubleshooting.
Create drill-down dashboards correlating metrics, logs, and traces using GCP tools.

5+ years of experience in SRE or DevOps.
Strong proficiency with Kubernetes (GKE), including namespace management and RBAC.
Experience implementing OpenTelemetry (OTEL) and distributed tracing with W3C Trace Context headers.
Expertise with GCP Observability tools: Cloud Monitoring (GMP) and Cloud Logging.
Proficiency in querying metrics using PromQL and Grafana.
Hands-on experience with JVM metrics collection, heap analysis, and garbage collection optimization for Java applications.
Experience building CI/CD pipelines and managing infrastructure using Docker, YAML, and Helm.
Familiarity with UI instrumentation for frontend monitoring and traceability.
Proficiency in Python and Linux scripting.

Any Gradute

Back To Jobs