You will design, implement, and mature enterprise-wide observability capabilities across hybrid on-premises and cloud environments.
Responsibilities
Develop and maintain the enterprise observability reference architecture covering logs, metrics, traces, events, dashboards, and alerts.
Lead the design and implementation of observability solutions supporting hybrid multi-cloud and on-premise environments.
Establish standards, governance, and reusable frameworks for telemetry generation, ingestion, correlation, storage, and visualization.
Architect and administer large-scale log aggregation platforms such as Splunk across on-prem and cloud deployments.
Required Skills
5+ years of hands-on experience with enterprise-scale log aggregation platforms, including architecture, deployment, and administration of tools like Splunk.
5+ years of experience using automated configuration management and IaC tools, including Ansible and Terraform.
2+ years of experience with APM tools such as AppDynamics or Dynatrace, covering end-to-end application visibility.
Strong understanding of cloud infrastructure and cloud-native monitoring technologies across AWS, Azure, or GCP.
Familiarity with OpenTelemetry, distributed tracing, and service mesh observability.
Expertise in designing dashboards, KPIs, and alerting strategies aligned with business SLIs/SLOs.
Experience with Network Performance Monitoring tools and methodologies.
Experience collaborating with DevOps, SRE, cloud engineering, and application teams in large enterprises.