← Back to jobs
Irving, TX, USA
No related jobs found
Role Summary
Seeking a hands-on L3 Linux Administrator to own stability, availability, and performance across large-scale Linux environments. The role demands deep troubleshooting skills, strong exposure to Veritas Clustering (VCS), SAN/NAS storage, and close coordination with data center teams for hardware incidents. The ideal candidate will work independently, lead incident resolution, and improve BAU operations through automation and best practices.
________________________________________
Key Responsibilities
Linux Administration (L3)
• Administer and troubleshoot RHEL, Oracle Linux, CentOS, SUSE in production.
• Diagnose complex OS issues: kernel panics, boot/GRUB failures, filesystem corruption, resource contention (CPU/RAM/I/O/Network), SELinux/AppArmor denials.
• Patch and upgrade OS at scale; manage package repositories and kernel updates with rollback strategies.
• Implement and audit security hardening (firewalld/iptables, CIS benchmarks, PAM, sudo, SSH, auditd).
• Manage system services (systemd), cron/timers, users/groups, sudoers, and system-wide configuration.
Veritas Cluster Server (VCS/InfoScale)
• Install, configure, and administer VCS for HA/DR across multi-node clusters.
• Create/maintain service groups, resources, dependency trees; configure LLT/GAB, I/O fencing, and quorum.
• Integrate VxVM/VxFS (disk groups, volumes, file systems) with application failover.
• Conduct DR drills, failover testing, and root cause analysis for cluster events.
Storage: SAN & NAS
• Liaise with storage teams for LUN provisioning, zoning, masking; validate multipathing (DM Multipath/PowerPath).
• Build and maintain filesystems (ext4/xfs/VxFS), mount policies, fstab and autofs.
• Manage NFS/CIFS/SMB exports/mounts, permissions, quotas, and locking issues.
• Troubleshoot pathing, latency, and I/O bottlenecks using OS, HBA, and array-side telemetry.
Data Center & Hardware Coordination
• Coordinate with DC teams for racking/stacking, cabling, console access, and physical triage.
• Diagnose hardware faults (CPU, memory, NIC/HBA, disks/RAID/SSD, backplane, PSU, fans) and firmware/BIOS alignment.
• Raise and track OEM tickets (Dell/HP/IBM/Cisco), manage RMA, and oversee replacements and post-fix validation.
BAU Operations & Incident Management
• Act as L3 escalation for P1/P2 incidents; drive bridge calls and lead technical recovery.
• Perform deep-dive log analysis (journald, syslog, dmesg, audit logs, application logs).
• Create/run SOPs/runbooks, maintain KB articles, and implement problem management (RCA, corrective actions).
• Support on-call rotation and scheduled maintenance windows (change management, CAB, MOPs).
Networking (Host-Level)
• Troubleshoot TCP/IP, routing, VLANs/bonding/teaming, MTU, host firewalls, DNS/DHCP, NTP/Chrony.
• Collaborate with network teams on L2/L3 connectivity, load balancers, and firewall rules.
________________________________________
Required Experience & Skills
• 8–12+ years in enterprise Linux system administration with proven L3 ownership.
• Strong hands-on with VCS (Veritas Cluster Server), VxVM, VxFS, and HA/DR patterns.
• Solid SAN/NAS experience: LUNs, zoning, multipath, NFS/SMB.
• Demonstrated success working independently and leading during critical incidents.
• Advanced troubleshooting: kernel, performance, storage, and cluster-level failures.
• Scripting proficiency (Bash; Python preferred). Familiar with Ansible.
• Familiarity with VMware/KVM and basic cloud (AWS/Azure/Linux in cloud) concepts.
• Strong documentation discipline (SOPs, MOPs, RCAs) and ITIL-aligned processes
Any Graduate
No related jobs found
← Back to jobs