Good Experience in building AI/ML models for Predictive Analysis, Anomaly Prediction, Classification on Observability Signals (Logs/Metrics/Traces)
Experience in handling unstructured data.
Experience in Azure if possible.
Good Experience in leveraging LLMs.
Key Responsibilities
Design and Implement Observability Solutions: Create robust observability pipelines that collect, aggregate, and analyze data from various sources, including metrics, logs, and traces.
Monitor System Performance: Continuously monitor the health and performance of applications and infrastructure to ensure high availability and reliability.
Data Analysis: Analyze collected data to identify trends, spot anomalies, and gain insights into system behaviors, enabling proactive issue resolution.
Log Management: Manage logs effectively, ensuring they are collected, stored, and analyzed to diagnose problems and monitor security.
Collaboration: Work closely with cross-functional teams, including developers, data engineers, and IT operations, to ensure seamless integration of observability practices across the organization.
Tool Integration: Integrate observability tools and frameworks, such as Prometheus, Grafana, and Splunk, to enhance monitoring capabilities and improve system visibility.
Incident Response: Participate in incident response activities, coordinating with teams to resolve issues swiftly and minimize downtim