You will own the design, configuration, CICD deployment, and optimization for enterprise-wide observability tools.
Responsibilities
Perform observability current-state assessments, gap analysis, and solutioning across application, infrastructure, database, security, middleware, and network domains.
Implement Golden Signals (latency, traffic, errors, saturation) using relevant telemetry sources.
Define monitoring standards, best practices, and governance to ensure consistency and scalability.
Collaborate with application and infrastructure teams to troubleshoot performance issues and implement permanent fixes.
Instrument OTEL Framework and develop DynaTrace Plug-and-play observability modules for Java and .Net applications.
Required Skills
7+ years of hands-on SRE experience with cloud technologies, tooling, and automation.
Strong hands-on automation experience for Observability as Code, dashboard as code, monitoring as code, and alert as code.
Practical experience implementing Golden Signals using telemetry sources.
Strong hands-on experience with AWS, including Control Tower, Project Setup, Account Creation, RDS, and SSO.
Solid understanding and practical experience with Docker and Kubernetes.
Proficiency with Linux Commands, GitLab CICD Setup, and Terraform state management.
Monitoring and alerting setup experience with Splunk, Prometheus, Grafana, Kibana, or ELK, with preference for APM (Dynatrace).
Experience integrating and automating cloud platforms (AWS, Azure, GCP).
Extended experience instrumenting the OTEL Framework.