Key Responsibilities:
- Act as the first point of escalation for technical and process-related issues.
- Provide technical expertise and ensure timely resolution of critical incidents.
- Perform root cause analysis (RCA) for major incidents.
- Plan and execute changes in coordination with stakeholders.
- Handle service requests and incidents within agreed SLAs.
- Perform trend analysis and drive incident reduction initiatives.
- Troubleshoot hardware issues and coordinate with vendors.
- Contribute to service improvement and optimization programs.
- Support new server builds, OS upgrades, and migration activities.
- Automate routine tasks using shell/Perl scripting.
- Train and mentor new team members.
- Ensure adherence to security, quality, and ITIL processes.
Qualifications and Skills:
Technical Skills (Must Have):
- Strong expertise in Linux Administration (RHEL, CentOS, SUSE)
- Hands-on experience with KVM and virtualization
- Expertise in LVM configuration and management
- Experience in Satellite Server management and configuration
- Knowledge of Dell and HP server hardware and firmware upgrades
- Strong skills in advanced shell scripting
- Experience in RedHat/Veritas cluster configuration
- Knowledge of NIS/NIS+ and DNS configuration
- Experience in patch management and OS recovery
- Hands-on experience in NFS and automount configuration
- Expertise in OS crash analysis and performance tuning
Secondary Skills (Nice to Have):
- Experience in AIX system administration and IBM Power Systems (POWER7/8/9/10)
- Knowledge of HMC, LPAR/DLPAR, PowerVM, VIOS administration
- Experience in PowerHA (HACMP), clustering, and failover testing
- Experience in OS migrations using NIM and firmware updates
- Knowledge of Solaris (8/9/10/11), Zones, and LDOM administration
- Familiarity with Oracle Ops Center tools
- Knowledge of storage technologies (SVM, VxVM, ZFS)
- Exposure to VMware environments
- Cross-domain knowledge (Storage, Backup, Database, Middleware, Network)