Description

You will own the reliability, availability, scalability, and performance of mission-critical telecom platforms and services.

Responsibilities

  • Ensure high availability and performance of telecom network platforms, OSS/BSS applications, and customer-facing services.
  • Develop automation frameworks for deployment, monitoring, incident response, and capacity management.
  • Design, implement, and maintain end-to-end observability solutions (logs, metrics, traces, alerts).
  • Lead incident response, root cause analysis, and postmortems for outages and performance degradation.
  • Implement SRE best practices (error budgets, SLIs, SLOs) aligned with telecom standards.

Required Skills

  • 3–7 years in SRE/DevOps/Systems Engineering roles (telecom domain experience preferred).
  • Strong knowledge of Linux/Unix system administration and network protocols (TCP/IP, DNS, VoIP, SIP, SS7, Diameter, 5G Core).
  • Hands-on experience with cloud platforms (AWS, Azure, GCP, or private telco clouds like OpenStack/VMware).
  • Proficiency in automation & configuration tools (Ansible, Terraform, Chef, Puppet).
  • Experience with CI/CD tools and pipelines (Jenkins, GitLab CI/CD, ArgoCD).
  • Monitoring/observability tools (Prometheus, Grafana, ELK/EFK, Splunk, OpenTelemetry, Nagios, Zabbix).
  • Programming/scripting in Python, Go, Shell, or Java.
  • Knowledge of Telecom OSS/BSS systems, API integrations, and microservices architecture.
  • Bachelor’s or Master’s degree in Computer Science, Telecommunications, Information Technology, or related field.

Education

Any Graduate