Lead the architecture and implementation of enterprise observability solutions, focusing on Splunk-based telemetry ingestion, indexing, and performance engineering.
Responsibilities
- Design and implement scalable Splunk architectures, including data ingestion pipelines, indexing strategies, and retention policies.
- Drive performance engineering initiatives by defining measurable benchmarks, optimization strategies, and key performance indicators.
- Establish comprehensive monitoring and alerting mechanisms to enable proactive incident detection and root cause analysis.
- Collaborate with engineering teams to integrate observability practices into application and platform development workflows.
- Manage Splunk administration activities to ensure platform stability, performance, and continuous improvement of monitoring coverage.
Required Skills
- 10+ years of experience in observability, performance engineering, and Splunk architecture.
- Deep expertise in designing observability solutions for distributed systems, covering metrics, logs, and traces.
- Strong knowledge of Splunk architecture, including data ingestion, indexing, and advanced search processing language (SPL).
- Proven ability to design scalable monitoring and alerting systems for proactive issue detection in cloud and microservices environments.
- Experience correlating telemetry data to support troubleshooting and root cause analysis in complex distributed environments.
- Familiarity with application performance monitoring (APM) tools and best practices.
- Ability to define performance benchmarks and optimization strategies for enterprise applications.
Preferred Skills
- Experience in Splunk administration, including configuration, platform maintenance, and user management.
- Strong analytical and problem-solving skills with effective communication for stakeholder collaboration.