Manage and scale enterprise observability infrastructure across global data centers and cloud environments.
Responsibilities
Build, deploy, and manage enterprise Lucene DB systems including Splunk, Loki, and Elastic for high-availability logging.
Provide 24x7 on-call support, including tool upgrades, performance tuning, and troubleshooting.
Lead the evaluation, design, and deployment of monitoring tools for modern containerized and cloud architectures.
Develop and maintain Kubernetes-based monitoring and logging solutions.
Drive automation and SRE practices within the observability domain.
Required Skills
6+ years of experience in observability, specifically logging (Splunk, Loki, LogScale, Elastic) and monitoring (Prometheus, Grafana, Fluentbit, Netcool, node exporters).
5+ years of System Administration experience.
3+ years of hands-on experience with Kubernetes.
Proficiency in Linux and Windows operating system management and administration.
Strong understanding of multi-tier application architectures and runtime environments.
Solid grasp of LAN/WAN technologies and networking concepts.
Experience with container-based logging and monitoring solutions.
Preferred Skills
Experience with monitoring infrastructure in AWS or Azure.
Knowledge of Python and infrastructure automation using Ansible.
CKA (Certified Kubernetes Administrator) or CKAD certification.