HPC Systems Administrator (Linux) - Provo, Utah United States - 21910

This closed position was filled. Check out our Jobs Search Page for our current open positions similar to this one.


Job #: 21910
Title: HPC Systems Administrator (Linux)
Job Location: Provo, Utah - United States
Employment Type:
Salary: contact recruiter for details
Other Compensation: Free Tuition for Candidate and Spouse as well as 50% Tuition Discount for Children of Candidate. 22 vacation days, 12 sick days, 12 calendar holidays, a pension, 401k matching, no-cost retirement savings program, and excellent medical/dental benefits.
Employer Will Recruit From: Nationwide
This role is with Brigham Young University which is owned by the LDS Church. All candidates MUST AGREE to follow a very strict Honor Code which can be found here: https://policy.byu.edu/view/index.php?p=26 Additionally, they will not pursue non-LDS candidates. As a religious institution, they have that option, and their VP feels it important to take that approach. They are specifically seeking graduates of BYU Provo, BYU Idaho, BYU Hawaii, BYU Pathways.
Relocation Paid?: Yes


The High-Performance Computing (HPC) Systems Administrator is an integral part of the operations of Research Computing. You will be responsible for architecting, installing, configuring, and maintaining the department infrastructure in cooperation with fellow systems administrators and other staff members. You will be responsible for High-Performance Computing (HPC) clusters, high-performance storage, Ethernet and Infiniband networks, OS image deployment, batch job scheduling, infrastructure servers, and other ancillary services. You will have significant latitude to make technical decisions and guide the direction of Research Computing.

This position supports the Research Computing in its mission to provide reliable, state-of-the-art HPC resources to researchers. Current resources include about 24,000 processor cores and petabytes of storage. All Research Computing systems use Linux.

To do this, you must be skilled in many IT-related fields, especially in the administration of Linux systems, and already possess many of the skills listed below. High-Performance Computing is a field that requires the combination of many specialties and skills. You need to be willing, able, and proactive about acquiring any listed skills that you may currently lack.


To succeed in this position, you will need to be dedicated and be able to work well with others. You must be very proactive and pay great attention to detail. You will work with the director, the other system administrators, and user support staff to accomplish the mission of the department. The system administrators are responsible for the following:

  • Design, implementation, monitoring, and maintenance of:
    • HPC clusters
    • Networks: Ethernet and Infiniband
    • Centralized storage and backups
    • Various infrastructure services that support the Research Computing (virtualization, web, databases, DNS, etc.)
    • Availability of services
  • Evaluate the acquisition of hardware and software solutions
  • Hire and manage a hardware technician
  • Assist user support staff as needed, especially to track down potential system issues
  • Automate routine tasks
  • Investigate and propose new methods to improve lab operations

This position has on-call responsibilities. However, after-hours outages have been rare and maintenance is typically performed during business hours.


Skills and Experience

Minimum qualifications: Bachelor's degree or four years of a combination of education and experience.

Required skills and experience:

  • Excellent Linux or Unix skills
  • Capability and desire to learn new skills
  • Good verbal and written communications skills
  • Systems programming skills (e.g. Python, Perl, bash, etc.)

Desired skills and experience:

  • Linux/Unix systems administration
  • Compiled languages (e.g. C, C++, Fortran)
  • Advanced Unix/Linux shell scripting (e.g. bash, tcsh)
  • Scripting languages (e.g. Perl, Python)
  • Administration of parallel file systems or enterprise-class storage (e.g. Lustre, SAN, NAS)
  • Installation, configuration, monitoring, maintenance of Ethernet and Infiniband networks
  • Various server types (e.g. web, DNS, database, mail servers)
  • Virtualization
  • Hardware monitoring (IPMI, SNMP, etc.)
  • Batch job scheduling systems (e.g. Slurm, Moab/Torque, LSF)
  • Backup systems (e.g. Bacula, TSM)
  • MySQL administration