Ollama is a command line tool that allows users to run LLMs locally.
llama is in early user testing phase - not all functionality is guaranteed to work. Contact staff@hpc.nih.gov with any questions.
Hardware requirements
Quantization considerations: 4-bit quantization reduces memory to ~25% of original. So it is highly recommended to use.Model Size | VRAM (FP16) | VRAM (4-bit) | GPU type |
---|---|---|---|
1–3B | 4-6GB | ~2GB | K80,P100,V100,V100x,A100 |
7–8B | 14-16GB | ~6-8GB | P100,V100,V100x,A100 |
13-14B | 26-28GB | ~12-16GB | V100x,A100 |
70B+ | 140GB+ | ~35-40GB | A100(4-bit) |
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --gres=gpu:1,lscratch:10 --constraint="gpuv100|gpuv100x|gpua100" -c 8 --mem=10g --tunnel salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load ollama [user@cn3144 ~]$ cd /data/$USER/ [user@cn3144 ~]$ ollama_start Running ollama on localhost:xxxxx ###################################### export OLLAMA_HOST=localhost:xxxxx ###################################### [user@cn3114 ~]$ export OLLAMA_HOST=localhost:xxxxx # or "source $SLURM_JOB_ID/ollama.sh" [user@cn3114 ~]$ ollama list [user@cn3114 ~]$ ollama pull gemma3:1b [user@cn3114 ~]$ ollama run gemma3:1b ###enter prompts what is long read sequencing [user@cn3114 ~]$ ###runs the gemma3:1b with the prompt and passes the response into a file called response.txt [user@cn3114 ~]$ ollama run gemma3:1b what is long read sequencing > response.txt [user@cn3114 ~]$ ollama_stop Terminated
Create a batch input file (e.g. ollama_job.sh). For example:
#!/bin/bash set -e module load ollama cd /data/$USER ollama_start sleep 2 source $SLURM_JOB_ID/ollama.sh ollama run gemma3:1b what is long read sequencing > response.txt ollama_stop
Submit this job using the Slurm sbatch command.
sbatch --partition=gpu --gres=gpu:1,lscratch:10 --constraint="gpuv100|gpuv100x|gpua100" -c 8 --mem=10g ollama_job.sh