Ollama is a command line tool that allows users to run LLMs locally.
llama is in early user testing phase - not all functionality is guaranteed to work. Contact staff@hpc.nih.gov with any questions.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --gres=gpu:1, lscratch:10 --constraint="gpuv100|gpuv100x|gpua100" -c 8 --mem=10g --tunnel salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load ollama [user@cn3144 ~]$ cd /data/$USER/ [user@cn3144 ~]$ ollama_start Running ollama on localhost:xxxxx ###################################### export OLLAMA_HOST=localhost:xxxxx ###################################### [user@cn3114 ~]$ export OLLAMA_HOST=localhost:xxxxx [user@cn3114 ~]$ ollama list [user@cn3114 ~]$ ollama pull gemma3:1b [user@cn3114 ~]$ ollama run gemma3:1b [user@cn3114 ~]$ ollama_stop Terminated
Create a batch input file (e.g. ollama.sh). For example:
#!/bin/bash set -e module load ollama cd /data/$USER ollama_start
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] ollama.sh