Site Operations Engineer

You will support the reliability and scalability of systems and services within the Crypto Services SRE team.

Implement and maintain monitoring, observability, alerting, and logging systems.
Design and deploy automated processes and tooling, including Ansible playbooks and API monitoring tools.
Monitor key performance metrics to identify opportunities for optimization and efficiency.
Collaborate with cross-functional teams to troubleshoot incidents, perform root cause analysis, and prevent recurrence.
Document workflows, procedures, and validate runbooks.

5+ years of experience in operations or site reliability engineering.
Strong Linux/Unix OS system administration and fundamentals.
Proficiency in shell scripting including Bash or Zsh.
Experience with interpreted or compiled languages such as Python, Perl, C/C++, Go, or Java.
Hands-on experience with configuration management and Infrastructure as Code using Ansible, Puppet, Terraform/Terragrunt, or CloudFormation.
Practical knowledge of containerization with Docker or Podman and orchestration with Kubernetes or Apache Mesos.
Understanding of network security, TCP/IP, and encryption principles including PKI, OpenSSL, and key exchange protocols.
Familiarity with SRE principles such as monitoring, alerting, error budgets, and fault analysis.

Any Graduate

Back To Jobs