|
To develop software, you must understand the problem as well as the desired solution. If you don’t understand the solution, you’re doing research. If you don’t understand the problem, you’re just mucking about with computers. Joe Leonard |
Hello and thank you for your interest in Biowulf, the High-Performance Computing (HPC) cluster at the National Institutes of Health. Biowulf, with over 90,000 cores, 70 petabytes of storage, and an array of over 900 GPUs, is one of the world’s largest computing clusters devoted to biomedical research and is available to any researcher in the NIH IRP (Intramural Research Program) who is sponsored by their IRP Principal Investigator and pays the flat $40/month usage fee. It has a wide variety of software installed supporting sequence analysis, structural biology, computational chemistry, mathematical statistics, and image processing. Biowulf is supported by a team of nearly 20 systems administrators and support scientists who ensure the system’s stability and assist users with any questions they encounter in the course of their work. The goal of this document is to provide information about the use of Biowulf to researchers who already have experience with another HPC cluster of some form. Potential Biowulf users who don’t have any particular cluster experience may find the virtual orientations for Biowulf (see https://hpc.nih.gov/training/) a more gentle introduction to cluster computing. New users with specific questions are also welcome to contact the HPC staff via email at staff@hpc.nih.gov and seek answers directly.
HPC clusters generally come in two varieties. Some, like the Department of Energy clusters at Sandia National Lab or Oak Ridge National Lab, are built to run a single, large problem program as fast as possible. These clusters are optimized for processor and network speed and are called Capability clusters. Others, like Biowulf, are designed to have as many CPU cores and as much memory as is feasible. Their overall goal is to provide as many compute cycles as possible to as many users as possible in a unit time. These are known as Capacity clusters. To use an automotive metaphor, a Capability cluster is like a Ferrari or Maserati (or maybe a VW microbus with a Porsche turbocharged engine) that can get a few people someplace as fast as possible. A Capacity cluster, on the other hand, is more like an 18-wheel tractor trailer. They can still do 75 MPH on the highway, but their raison d’etre is to move a large quantity of freight from point A to point B efficiently. The practical upshot of Biowulf being a Capacity cluster is that it is shared between over 3,000 users, with over 500 of them actively running jobs on the compute nodes simultaneously.
As stated above, Biowulf is a Linux-based, heterogeneous HPC cluster hosted by the NIH Center for Information Technology (CIT) that is available to researchers in the NIH IRP. It is composed of a login node, called biowulf.nih.gov and thousands of compute nodes, among other things. Almost all the actual computing occurs on the compute nodes. The head node is primarily there as a gateway to the compute nodes, not as a compute resource itself. Access to the compute nodes is mediated by the SLURM resource manager/batch scheduler. Users cannot connect to nodes outside of the batch system unless the batch system has already placed one of their jobs on that particular node. Most batch jobs are allowed to run for up to 10 days by default. In our SLURM configuration, there are several partitions, or queues, configured, each with a different purpose.
Some of the most important queues are:
sbatch, and there is no guarantee that the cluster will
stay up to let your job run to completion. Scheduled maintenance on
Biowulf can still cause jobs in unlimited to terminate early.On Biowulf, there are several filesystems available to users that fulfill different roles in the Biowulf ecosystem. It is important to note that PII/PHI data is NOT allowed anywhere on Biowulf without consulting with the HPC staff and making special arrangements.
It is also important to note that for maximum performance efficiency no single directory or subdirectory should contain more than 5000 files. If you have a need to use multiple thousands of files, your jobs will obtain best performance by creating a nested directory structure of no more than 5000 files and directories per subdirectory. Operating on a single directory containing millions of files is one of the best ways to unnecessarily increase your job’s run-time.
The /home filesystem is intended to hold configuration files and short scripts for Biowulf users. It is not particularly fast to access, but it is designed to be highly available. It is strictly limited to 16 gigabytes per user. This amount cannot be increased. /home is backed up to tape and there are frequent snapshots taken on an hourly, daily, and weekly basis.
The /data filesystem, on the other hand, is designed to offer at least 100 gigabytes to each user, which can be increased upon providing a justification to the Biowulf staff. It is made up of high-performance filesystems and is shared between the head nodes and the compute nodes, so it is not as fast as local disk. /data is not backed up; users are responsible for backing up their data and executables to other systems that they have access to through their individual ICs. It does, however, have snapshots taken on a daily and weekly basis. It is assumed most of the data and programs that a user or research group are actively working with will be stored in /data. It is NOT archival or permanent storage for research data. As a rule of thumb, data should be migrated from /data to an archival solution provided by the IC after the publication of results based on that data.
/lscratch is scratch space available on the local SSD of each compute node. As such, the amount available varies from node to node, but it should be at least 400 gigabytes. By default, no /lscratch space is allocated with a SLURM job; it must be requested in the batch submission with a gres option. /lscratch space is allocated on a per-job basis. Each user has a quota assigned to their /lscratch subdirectory, so an allocation is guaranteed space on a per-job basis. It cannot be shared between users or jobs. In addition, the /lscratch subdirectory for a job is automatically deleted when a job completes in order to make the space available for the next job. It is the user’s responsibility to stage files to and from /lscratch at appropriate times during the execution of their job.
/scratch is a large filesystem that is only mounted on the head nodes. Users do not have a directory on it by default, but they may create subdirectories with arbitrary names under /scratch. Its primary purpose is to allow sharing large data sets between Biowulf users. Users are allowed to use up to 10 terabytes of storage on /scratch, but all files are deleted after 10 days, or sooner if the filesystem reaches 90% of capacity at any point. /scratch is NOT mounted on the compute nodes, and users should not attempt to run jobs out of it.
/tmp is a directory that is traditionally used in Linux to hold temporary or scratch files that are needed in the course of a program’s run. Users who are using codes that use more than a nominal amount of /tmp space should request that /lscratch space is allocated for their jobs and that the $TMPDIR environment variable is set to the appropriate subdirectory of /lscratch. Most of the free disk space on the compute nodes is allocated to /lscratch. Users who use /tmp instead of /lscratch and fill /tmp can make compute nodes crash, potentially causing all the jobs running on that node to be lost. If this happens repeatedly, your access to Biowulf may be suspended until you work with one of the support scientists to move your scratch space needs to another directory. /tmp and /lscratch have comparable performance.
There are more applications installed on Biowulf than can
be easily listed in a short orientation document such as this one. They are
added to user environments and accessed through the module
command. Basic documentation for most of the installed applications can
be found through the search box of https://hpc.nih.gov. If you want to browse what is
installed in a more organic manner, you can look at the module files in
/usr/local/lmod/modulefiles, use the module spider command,
or examine the actual install directories in /usr/local/apps. If the
program you want to use isn’t installed, and you aren’t certain about
how to install it, contact the HPC staff and we will be glad to take a
look at your options for the application.
To access information about your Biowulf account, including what
percentage of disk quota is in use and the status of submitted and
completed jobs, the HPC dashboard is available at https://hpc.nih.gov/dashboard. It will show which /data
directories you have access to, as well as the resources being consumed
by running and completed jobs. It is also the easiest way to reenable
your HPC account (if it is locked because it hasn’t been used in 60
days), or to request more space for one of your /data directories. Using
the dashboard is also an effective way to determine why a job failed or
to help tune the resource requests in a SLURM submission to ensure a job
will be able to run to completion without excessive over-consumption of
resources. If you prefer to work in a command-line environment, the job
accounting data is available on Biowulf as well, by using the
dashboard_cli command. Running the command with no options
will provide help options for using it.
Accessing Biowulf is possible via several methods, but it is important to note that access is only possible through the NIH network, either by being connected to the network on campus, or by using the VPN from off campus. Access is not possible from the “NIH Guest” WiFi SSID, even though it is on the NIH campus.
The easiest way to access Biowulf is through the ssh protocol. From
the terminal on a Mac, or from either the command prompt or powershell
on a Windows system, simply type ssh
$username@biowulf.nih.gov. After entering your NIH
login password (which will not echo) a session will begin on the Biowulf
head node that lasts until you log out. From this session, you can
create and submit batch jobs or start interactive sessions on the
compute nodes. Strictly speaking, including your username may not be
required if your local account’s username is the same as your NIH AD
username, but on most Windows systems on campus, the string ‘NIH\’ will
be automatically prepended to your username, which will break the login
process. For Windows users, we actually recommend the use of the PuTTY
suite of programs which offers a better interface to the ssh protocol
than the raw ssh client provided with Windows. If it is not installed on
your system, open a support ticket with your IC’s local desktop support
organization.
For more complex access needs, such as requiring access to GUI-based applications, we provide Open On-Demand, which allows sessions with a full graphical desktop session in the convenience of your local web browser. To use Open On-Demand, navigate to https://hpcondemand.nih.gov and follow the prompts to start a session. Please note that you will need your PIV card to log into Open On-Demand. For more assistance with Open On-Demand, please send your questions or issues to staff@hpc.nih.gov
There are also multiple protocols possible for file
transfers to and from Biowulf. The most important thing to remember about
these methods is that most of them need to communicate with helix.nih.gov
and not biowulf.nih.gov. scp, sftp, and
rsync all work to transfer files as expected, with
scp most useful for one or two files at a time,
sftp for a few files, and rsync working
efficiently on moderately-sized directory trees. Please note that
attempting to connect to biowulf.nih.gov with scp or
sftp will fail, and attempting to use rsync
with it will fail after about 5 minutes of file transfer.
For large-scale file transfers (more than about 10 gigabytes), we recommend the use of Globus, as it is built to be very robust with regards to interrupted or slow connections. It can also be used to share data with collaborators who are not at NIH or who don’t have Biowulf accounts. There is more detail available about Globus at https://hpc.nih.gov/docs/globus. Finally, we offer hpcdrive, which is a mechanism to mount filesystems from Biowulf to local user workstations. This is most appropriate for quickly transferring small files or directories to a local workstation from Biowulf, or vice versa. It is also a useful mechanism to allow click and read access to HTML formatted reports generated by applications on Biowulf. More information about hpcdrive can be found at https://hpc.nih.gov/docs/hpcdrive.html .
There are a small number of access methods for Biowulf that we
explicitly do NOT recommend and advise against using.
The first of these is the use of Filezilla for file transfer. There has
been a problem with malware getting bundled with Mac and Windows
downloads for Filezilla, so we strongly encourage our users to stay far
away from it for security’s sake. We also discourage users from using
WinSCP as a mechanism of accessing the shell on Biowulf or Helix. While
it is possible, the interface is very crude, and it leaves no way to use
text-based user interfaces. It’s a useful party trick if you know how to
work in ed, but beyond that, it’s a clunky way to set up
your runs. Finally, we don’t recommend the use of MobaXterm on Windows
hosts as a primary X Windows interface to Biowulf. There are a few edge
cases where it can be helpful, but we do not have the staffing to
support it as a general solution. If you are not very experienced in
working with X as your primary window manager, we recommend avoiding
it.
rsync after the Biowulf head node process killer
terminates it. If these emails are ignored or not acted upon, your
account may be locked or your ability to submit SLURM jobs may be
interrupted until we establish communications with you and you indicate
that you will not continue with the problematic behavior. It is worth
noting that we get your email address from NED when you apply for a
Biowulf account, but it is not automatically mirrored from NED after
that time. If you change your NIH work email address, please contact us
to make certain that email from your SLURM runs or the HPC staff will
still be able to find you. Finally, when using the
--mail-user option to sbatch, please make sure that your
correct email is included in the option. It should not be set to
user@nih.gov or anything other than your nih.gov email address. Please
don’t set this without testing it and then submit a large swarm or job
array. If it is not set correctly, the HPC staff will get at least one
bounce message for every email you tried to send which tends to hammer
our mail server.freen provides information about
free resources on Biowulf, and batchlim shows the limits
placed on different batch queues. These can be found in one place with
the bwulf meta-command but can also be run with just their
short names as well. For more detail on these commands, run
bwulf -h.sbatch or sinteractive command and that is
automatically cleaned up at the end of a job. It is allocated on a
per-job basis as /lscratch/$SLURM_JOBID, and cannot be
shared between users or subjobs. To allocate /lscratch space for a job,
use the --gres=lscratch:## flag to your job submission,
where ## is the amount of disk space you need, in
gigabytes. Please note that you will need to manually stage any data
that you want to process in your /lscratch directory from another
location, such as your /data directory and change to the allocated
directory before starting your processing.swarm instead of job arrays: Job
arrays are a convenient way to keep several related subjobs together in
a single SLURM submission but running them requires a certain level of
shell scripting expertise. We have implemented a SLURM utility called
swarm that allows you to run larger arrays of jobs by just
entering the commands for each subjob on separate lines in a single
file. It is an easier way to set up job arrays if your shell scripting
is rusty. See the swarm man page for more details or go to
https://hpc.nih.gov/apps/swarm.html.dust command will provide a tree view of which
directories are the likely culprits.squeue won’t be updated
more often than that. Similarly, trying to launch multiple jobs all at
once (such as with a shell loop over a globbed set of batch scripts) can
cause stability problems with the scheduler. We request that you either
use swarm to launch jobs with multiple sub-jobs or that you
submit no more than 1 job per second with sbatch. Likewise,
please do not run squeue more than once every two minutes,
and avoid running it inside the watch command to monitor
its output. Ideally, you should probably explore using the
sjobs command to monitor your jobs. It provides the same
information as squeue, but it gets its data from the
dashboard server, which doesn’t place additional strain on the SLURM
scheduler.awk command or python script can
easily take up a big chunk of memory or I/O and leave other users with
unresponsive prompts. We have seen a single samtools view
be enough to cause problems at times. Therefore, we ask that you limit
your work on the Biowulf head node to light editing, code compilation,
job submission, and reading output files. This is enforced by a process
killer script that terminates any process not owned by root that
accumulates 5 minutes of CPU time. Additionally, users cannot exceed
using 4 CPUs at a time on the head node. We have a second host, Helix,
that mounts the same filesystems as Biowulf and that is intended for
large-scale or interactive data transfer jobs and intensive file
manipulations. wget or other download processes can be run
there inside a screen or tmux session to make
downloading large file sets from Internet hosts easier. We do request
that downloads be limited to no more than 6 at a time in parallel.salloc and srun: The
traditional way to request an interactive allocation of CPU and memory
on a SLURM cluster is to use salloc followed by
srun to start an interactive shell on one or more of the
compute nodes. In our experience, this process can be confusing for
users and is prone to errors. We have developed a simpler script, called
sinteractive, that combines the actions of these two
programs into a single command and which simplifies the management of
interactive sessions. Using salloc and srun to
obtain interactive sessions is not supported on Biowulf; we strongly
urge you to use sinteractive instead. Please note that
there is a hard limit of 2 interactive sessions per user on Biowulf, no
matter how the sessions are obtained.sbatch command which the scheduler will accept, and
happily refuse to ever run. To address these issues and to simplify our
logging processes, we have put wrappers around sbatch and
salloc. Please don’t try to run the raw binaries instead.
If the wrappers are interfering with a workflow tool you are using,
please let us know and we will work with you to adapt the wrappers.sudo or su:
We’ve all seen application installation instructions that tell you to
run sudo to install a requisite library or the final
application to a system directory. Please don’t attempt to do this on
Biowulf. We don’t grant general user access to these commands, and if
you try to use them, it will trigger alarms that our security team will
have to investigate. We understand that you just the your latest version
of the latest tool installed, and you think that you are helping by
trying to do the installation yourself rather than putting in a ticket
for it. Thank you for the thought, but if the installation instructions
involve a sudo or su call, and you don’t know
how to work around it, please just put in a ticket and we will help deal
with it. Similarly, users don’t have write access to /usr/local. Any
attempts to write software to this filesystem will fail.swarm, start with small jobs and work your way up to
the full-size problem you want to run. When a swarm fails, please stop
and consider what went wrong before diving in, making a quick fix, and
resubmitting it. If the problem was in a single line of the swarmfile or
in a common run script that all lines reference, please don’t resubmit
the swarm to test your fix. Test the fix with submission with a single
sbatch submission and only then, if the fix works, resubmit your swarm.
Only run debugging jobs in swarm if the error or problem is
coming from the swarm program directly, such as problems
with bundling or resource allocation. Even then, please simplify the
swarmfile to the fewest number of tasks or the simplest possible job
script that allows you to reproduce the problem. Also, while it may be
tempting to use swarm for all your job submissions, please
don’t use it to submit single-subjob runs. Yes, it may work, but it only
makes it harder to debug things when they go wrong.The easiest way to get more information about the Biowulf cluster is to check out our website at https://hpc.nih.gov. While we don’t have documentation for every question, our goal is to put as much information up on that site as is possible. If your question isn’t found there, or if you know that it is so specific as to be a one-off kind of question, please feel free to contact the group’s staff at staff@hpc.nih.gov. We have a rotation schedule to make sure that someone is always covering that email address; reaching out to individual team members with problems runs the risk of sitting unseen in their mailboxes if they are out of the office or otherwise unavailable. Also, mailing the staff email address helps us route the ticket to the staff member with the most relevant experience to help you. Also, please do not initiate contact for a problem with us in Teams. Our workflow requires that a ticket be generated for each contact so that we can track issues with Biowulf, and dropping an initial request in Teams means that we will need to create a ticket manually. We are not averse to Teams consultations to see particular problems “in the wild,” but these are best scheduled after initial intake and triage for a given problem. Finally, we do have an open Virtual Walk-in consult via Teams every month. Information about this event is mailed to all Biowulf users in advance of it every month.
Resource URL/Email Main Website https://hpc.nih.gov Support Email staff@hpc.nih.gov HPC Training Information https://hpc.nih.gov/training/ SSH login host biowulf.nih.gov Open On-Demand https://hpcondemand.nih.gov File Transfer Host helix.nih.gov Globus Documentation https://hpc.nih.gov/globus hpcdrive documentation https://hpc.nih.gov/hpcdrive.html
Item Limit /home quota 16 GB (cannot be increased) /data quota 100 GB default (expandable on request) /scratch quota 10 TB /scratch retention 10 days maximum Max job run-time 10 days (unlimited partition can be more) Interactive sessions max 2 per user Head node CPU time limit 5 minutes per process before termination Max CPUs on head node 4 Parallel downloads on helix 6 maximum