← Back to jobs
Dallas, TX, USA
No related jobs found
Automation Developing software to automate manual operational tasks eg deployment configuration to reduce errors.
Monitoring ing Creating and managing tools to monitor system health performance and capacity eg logging metrics dashboards.
Incident Management Participating in on call rotations to resolve critical production level issues and performing post incident reviews postmortems to prevent recurrence.
System Optimization Improving system scalability security and performance through system design reviews and capacity planning.
Collaboration Working with development teams to ensure software is designed for production reliability including setting and maintaining Service Level Objectives SLOs and Service Level Indicators SLIs.
Coding Scripting Proficiency in languages such as Python Go Ruby Java or Shell scripting.
Systems Engineering Strong understanding of Linux Unix operating systems networking and distributed systems.
Cloud Platforms Experience with cloud providers like AWS Google Cloud Platform GCP or Azure.
DevOps Tools Knowledge of CICD pipelines Infrastructure as Code IaC tools eg Terraform Ansible and containerization e.g. Kubernetes Docker.
Problem Solving Strong analytical skills for debugging troubleshooting and root cause analysis.
SREs are in effect software engineers focused on the reliability of production systems balancing the need for rapid feature releases with the stability of the platform
Any Gradute
No related jobs found
← Back to jobs