F.A.Q. UAHPC Node Provisioning 

1. How do I contribute existing nodes to the UAHPC cluster? 

If you are new to UA and have existing nodes you would like to contribute to the cluster, please contact D. Jay Cervino (dcervino@ua.edu).  

2. How do I purchase new nodes? 

Contact OIT to purchase new nodes for UAHPC. OIT is happy to meet with you to discuss your research needs and offer suggestions for node purchases.  

Once you have had an initial discussion with OIT: 

  1. Engage Dell for node quote. OIT can provide you with Dell representative contact information. OIT is happy to attend any engagement meetings with you and Dell while you obtain the quote. 
  2. OIT will review node specs and quote from Dell for completeness and compatibility with UAHPC infrastructure.  
  3. Once you receive OIT sign-off, work with the grant support personnel in your department to purchase the nodes with the quotes provided from Dell.
  4. OIT cannot purchase nodes for researchers off grants 
  5. Unless your node is intended to be individually hosted at Gordon Palmer data center, UAHPC nodes are shipped to DC BLOX. OIT will provide shipping information as needed.   

3. Does UAHPC have minimum node specifications?

Yes. OIT requires the following minimum node specifications for use in UAHPC: 

  • At least 24 CPU cores per rack unit 
  • At least 4GB RAM per CPU core 
  • At least 256GB of local disk (preferably a mirrored pair of SSDs if small, or at least a 4-disk array of SAS/SATA drives if large) 
  • A 10GbaseT network card or onboard port (not SFP-style, unless also including a 10GbT SFP) 
  • A 16x PCIe slot for an infiniband card 
  • A Mellanox ConnectX-6 InfiniBand card 
  • An HDR InfiniBand cable (usually one splitter cable per two compute nodes) 
  • An enterprise iDRAC or equivalent means of remote administration 
  • At least 3 years of warranty 

4. What other considerations should I think about when purchasing nodes?

Do you often need substantial amounts of memory per CPU core?  

OIT considers 4GB per core to be the minimum requirement. We have multiple compute nodes with exactly that amount. But that is barely enough to start MATLAB, let alone work with large datasets or data structures. Some purchases have included much higher amounts of RAM, about 64GB per core or 2TB total in a compute node. 

Do you plan to use software that can take advantage of a GPU?  

Even if a GPU is not in your current budget you may want to consider selecting a server model where you can add one later. 

Do you work with software that is optimized for use on Intel CPUs as opposed to AMD?  

We find AMD more cost-effective but not all software runs on it effectively due to intentional changes in Intel’s software suite. 

Do you tend to work more with software that is single-threaded or multi-threaded (limited to running on a single compute node), or with software that uses a framework like MPI to take advantage of distributed computation over multiple nodes?  

If the former, then you might want to aim for a faster CPU, or a lot of CPU cores per compute node. If the latter, then you can select a cheaper CPU. 

Do you run software that needs a lot of local scratch space during computations? Or just a little space but it absolutely has to be fast?  

This would determine the number and type of disks needed, and what type of server chassis can accommodate them. 

Do you plan to buy multiple similar nodes?  

If so, a sled chassis or blade chassis may be more efficient than traditional rackmount servers (but it may increase the upfront cost for a single node and may also place limitations on what kind of nodes it can accept). 

5. Where are the new nodes installed?

OIT will install the new nodes in the UAHPC cluster at DC BLOX Data Center in Birmingham, AL. 

6. Who manages the nodes? 

OIT manages the installation and administration of the node.  

Typical administration tasks include: 

  • Node installation 
  • Security patching and monitoring
  • OS patching and updates 
  • Software updates and installation
  • Cluster infrastructure refreshes (chassis, Infiniband, Mellanox switches, misc. optics, SciNet, etc.) 
  • Provisioning of HPC-attached storage
  • Hardware monitoring with repair oversight 
  • Update licensing for standard cluster software and packages 
  • Cluster scheduler monitoring
  • Maintaining user accounts linked with MyBama user names and automatic VPN access

7. Who owns the nodes? 

For the purposes of UA auditing, the node purchaser retains asset ownership. Assets are NOT transferred to OIT. 

8. How long will my nodes stay in the cluster? 

OIT runs nodes on a 5-year lifecycle. After such time, the nodes may be removed from UAHPC and provisioned into a student or test cluster for additional non-essential life in support of student success and workforce development.  


F.A.Q. Running jobs on HPC resources

1. Are there any fees for using UAHPC/CHPC/GIS 

The university does not charge for use of UAHPC or CHPC currently. 

OIT funds cluster infrastructure (chassis, Infiniband, Mellanox switches, misc. optics, SciNet, etc.).

Researchers receive 50GB in /home/$USER/ and 500GB in /bighome/$USER/, and 20TB in /scratch, free of charge. Up to 100TB additional HPC-attached storage can be purchased at $50/TB/year. See Research Data Storage page for additional details.  

Research Data Storage

2. How does job prioritization on UAHPC work? 

The node purchaser/owner has priority on their nodes. This means that the node owner will be able to preempt other jobs that are running on the nodes.  

When the nodes are not fully utilized or idle, other users can run jobs on the nodes. 

The node owner may also use other nodes in the cluster, not just ones under their ownership. 

N.B.: OIT does not guarantee that the exact same node is available to the researcher at the time of need. For example, we do not usually guarantee access to your specific node if it’s similar enough to others that running your job when priority access on someone else’s node would be equivalent. If the node is different enough, we can create a new partition that does guarantee access to a particular node only, but the cluster is more usable for everyone else if we avoid this.