← Back to jobs
San Jose, CA, USA
No related jobs found
Must-Have Skills
✔ Linux System Administration & Troubleshooting
✔ Linux Command Line & Bash/Shell Scripting
✔ GPU Server Deployment & Cluster Bring-Up
✔ Driver Installation & System Configuration
✔ InfiniBand Networking (Switch Configuration & Subnet Management)
✔ GPU Cluster Validation & End-to-End Testing
✔ TCP/IP Networking Fundamentals (OSI, IP, ARP, ICMP, TCP, UDP)
✔ Routers, Switches & Terminal Server Configuration
✔ Rack & Stack, Server Hardware Installation & Maintenance
✔ Fiber & Copper Cabling (IP & SAN)
✔ Incident Management, Ticketing & SLA Support
📌 Key Responsibilities
🔹 Deploy, maintain, and support enterprise data center infrastructure
🔹 Install, configure, and troubleshoot Linux servers and GPU platforms
🔹 Perform InfiniBand fabric bring-up, configuration, and troubleshooting
🔹 Support GPU cluster deployments and infrastructure validation
🔹 Install and maintain server hardware, networking equipment, and storage infrastructure
🔹 Monitor system health, resolve production issues, and participate in on-call support
🔹 Collaborate with global infrastructure and engineering teams to ensure high availability and operational excellence
➕ Preferred Qualifications
✅ Bachelor's Degree in Computer Science, Engineering, IT, or related field
✅ Experience with large-scale data center operations
✅ Strong documentation, troubleshooting, and analytical skills
✅ Experience supporting enterprise production environments
Bachelor's degree
No related jobs found
← Back to jobs