You will own the engineering and operational health of the OpenShift Virtualization platform.
Responsibilities
Conduct capacity planning and forecasting for OpenShift Virtualization, covering compute, memory, storage, and network resources.
Analyze resource utilization trends to recommend infrastructure scaling, consolidation, or optimization.
Develop and maintain capacity models and reports to support strategic planning alongside application teams.
Develop automation solutions (scripts, playbooks) for repetitive OSV tasks, including configuration changes and VM management.
Implement Site Reliability Engineering (SRE) principles, manage Role Based Access Control, and maintain end-to-end observability solutions (monitoring, logging, tracing).
Required Skills
6+ years of experience in related infrastructure or cloud engineering fields.
Expertise with OpenShift Virtualization platform.
Proficiency in scripting and automation using Python and PowerShell.
Strong experience with Cloud Architecture and Cloud Infrastructure principles, specifically GCP.
Proven ability in Root Cause Analysis (RCA), troubleshooting, and complex problem-solving.
Experience with Kubernetes and utilization management.
Familiarity with Ansible, GitHub, and change management processes.
Knowledge of security practices, including Information Security and Access Controls.
Experience monitoring VM health and performance metrics using tools like Dynatrace and Prometheus/Grafana.