RFdiffusion: an open source method for structure generation

Quick Links

Rosetta Fold (RF) dissusion is an open source method for structure generation, with or without conditional information (a motif, target etc). It can perform motif scaffolding, unconditional protein generation, and other tasks.

Reference:

Joseph L. Watson, David Juergens, Nathaniel R. Bennett, Brian L. Trippe, Jason Yim, Helen E. Eisenach, Woody Ahern, Andrew J. Borst, Robert J. Ragotte, Lukas F. Milles, Basile I. M. Wicky, Nikita Hanikel, Samuel J. Pellock, Alexis Courbet, William Sheffler, Jue Wang, Preetham Venkatesh, Isaac Sappington, Susana Vázquez Torres, Anna Lauko, Valentin De Bortoli, Emile Mathieu, Sergey Ovchinnikov, Regina Barzilay, David Baker
De novo design of protein structure and function with RFdiffusion
Nature, 620, pages 1089–1100 (2023).

Documentation

RFdiffusion Github page

Important Notes

Module Name: RFdissusion (see the modules page for more information)
Unusual environment variables set
- RFDIFFUSION_HOME installation directory
- RFDIFFUSION_BIN executable directory
- RFDIFFUSION_SRC source code directory
- RFDIFFUSION_DATA sample data directory

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=20g -c8 --gres=gpu:p100:1,lscratch:10
[user@cn3335 ~]$ module load RFdiffusion 
[+] Loading singularity  3.10.5  on cn4338
[+] Loading RFdiffusion  1.1.0
[user@cn3335 ~]$ git clone https://github.com/RosettaCommons/RFdiffusion
[user@cn3335 ~]$ cd RFdiffusion
[user@cn3335 ~]$ python-rfd ./scripts/run_inference.py -h
run_inference is powered by Hydra.

== Configuration groups ==
Compose your configuration from those groups (group=option)



== Config ==
Override anything in the config (foo.bar=value)

inference:
  input_pdb: null
  num_designs: 10
  design_startnum: 0
  ckpt_override_path: null
  symmetry: null
  recenter: true
  radius: 10.0
  model_only_neighbors: false
  output_prefix: samples/design
  write_trajectory: true
  scaffold_guided: false
  model_runner: SelfConditioning
  cautious: true
  align_motif: true
  symmetric_self_cond: true
  final_step: 1
  deterministic: false
  trb_save_ckpt_path: null
  schedule_directory_path: null
  model_directory_path: null
contigmap:
  contigs: null
  inpaint_seq: null
  provide_seq: null
  length: null
model:
  n_extra_block: 4
  n_main_block: 32
  n_ref_block: 4
  d_msa: 256
  d_msa_full: 64
  d_pair: 128
  d_templ: 64
  n_head_msa: 8
  n_head_pair: 4
  n_head_templ: 4
  d_hidden: 32
  d_hidden_templ: 32
  p_drop: 0.15
  SE3_param_full:
    num_layers: 1
    num_channels: 32
    num_degrees: 2
    n_heads: 4
    div: 4
    l0_in_features: 8
    l0_out_features: 8
    l1_in_features: 3
    l1_out_features: 2
    num_edge_features: 32
  SE3_param_topk:
    num_layers: 1
    num_channels: 32
    num_degrees: 2
    n_heads: 4
    div: 4
    l0_in_features: 64
    l0_out_features: 64
    l1_in_features: 3
    l1_out_features: 2
    num_edge_features: 64
  d_time_emb: null
  d_time_emb_proj: null
  freeze_track_motif: false
  use_motif_timestep: false
diffuser:
  T: 50
  b_0: 0.01
  b_T: 0.07
  schedule_type: linear
  so3_type: igso3
  crd_scale: 0.25
  partial_T: null
  so3_schedule_type: linear
  min_b: 1.5
  max_b: 2.5
  min_sigma: 0.02
  max_sigma: 1.5
denoiser:
  noise_scale_ca: 1
  final_noise_scale_ca: 1
  ca_noise_schedule_type: constant
  noise_scale_frame: 1
  final_noise_scale_frame: 1
  frame_noise_schedule_type: constant
ppi:
  hotspot_res: null
potentials:
  guiding_potentials: null
  guide_scale: 10
  guide_decay: constant
  olig_inter_all: null
  olig_intra_all: null
  olig_custom_contact: null
  substrate: null
contig_settings:
  ref_idx: null
  hal_idx: null
  idx_rf: null
  inpaint_seq_tensor: null
preprocess:
  sidechain_input: false
  motif_sidechain_input: true
  d_t1d: 22
  d_t2d: 44
  prob_self_cond: 0.0
  str_self_cond: false
  predict_previous: false
logging:
  inputs: false
scaffoldguided:
  scaffoldguided: false
  target_pdb: false
  target_path: null
  scaffold_list: null
  scaffold_dir: null
  sampled_insertion: 0
  sampled_N: 0
  sampled_C: 0
  ss_mask: 0
  systematic: false
  target_ss: null
  target_adj: null
  mask_loops: true
  contig_crop: null


Powered by Hydra (https://hydra.cc)
Use --hydra-help to view Hydra specific help

Download pretrained models:

[user@cn3335 ~]$ bash scripts/download_models.sh models
...

Download sample data:

[user@cn3335 ~]$ cp $RFDIFFUSION_DATA/* .

If needed, edit the configuration file:

config/inference/base.yaml

Run RFdiffusion on the data, using the settings from the configuration file:

[user@cn3335 ~]$ python-rfd ./scripts/run_inference.py inference.output_prefix=./ inference.input_pdb=./sample.pdb 'contigmap.contigs=[10-40/a394-408/10-40]' +schedule_directory_path=./ & 
[2023-06-05 10:23:41,241][__main__][INFO] - Found GPU with device_name Tesla K80. Will run RFdiffusion on Tesla K80
Reading models from /vf/users/user/RFdiffusion/RFdiffusion/rfdiffusion/inference/../../models
[2023-06-05 10:23:41,242][rfdiffusion.inference.model_runners][INFO] - Reading checkpoint from /vf/users/user/RFdiffusion/RFdiffusion/rfdiffusion/inference/../../models/Base_ckpt.pt
This is inf_conf.ckpt_path
/vf/users/user/RFdiffusion/RFdiffusion/rfdiffusion/inference/../../models/Base_ckpt.pt
[user@cn3335 ~]$ nvidia-smi
Tue Apr 23 16:34:35 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          On  | 00000000:46:00.0 Off |                    0 |
| N/A   32C    P0              66W / 400W |    886MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB          On  | 00000000:C7:00.0 Off |                    0 |
| N/A   32C    P0              61W / 400W |      8MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   3988441      C   /opt/conda/envs/SE3nv/bin/python            872MiB |
+---------------------------------------------------------------------------------------+

Assembling -model, -diffuser and -preprocess configs from checkpoint
USING MODEL CONFIG: self._conf[model][n_extra_block] = 4
USING MODEL CONFIG: self._conf[model][n_main_block] = 32
USING MODEL CONFIG: self._conf[model][n_ref_block] = 4
USING MODEL CONFIG: self._conf[model][d_msa] = 256
USING MODEL CONFIG: self._conf[model][d_msa_full] = 64
USING MODEL CONFIG: self._conf[model][d_pair] = 128
USING MODEL CONFIG: self._conf[model][d_templ] = 64
USING MODEL CONFIG: self._conf[model][n_head_msa] = 8
USING MODEL CONFIG: self._conf[model][n_head_pair] = 4
USING MODEL CONFIG: self._conf[model][n_head_templ] = 4
USING MODEL CONFIG: self._conf[model][d_hidden] = 32
USING MODEL CONFIG: self._conf[model][d_hidden_templ] = 32
USING MODEL CONFIG: self._conf[model][p_drop] = 0.15
USING MODEL CONFIG: self._conf[model][SE3_param_full] = {'num_layers': 1, 'num_channels': 32, 'num_degrees': 2, 'n_heads': 4, 'div': 4, 'l0_in_features': 8, 'l0_out_features': 8, 'l1_in_features': 3, 'l1_out_features': 2, 'num_edge_features': 32}
USING MODEL CONFIG: self._conf[model][SE3_param_topk] = {'num_layers': 1, 'num_channels': 32, 'num_degrees': 2, 'n_heads': 4, 'div': 4, 'l0_in_features': 64, 'l0_out_features': 64, 'l1_in_features': 3, 'l1_out_features': 2, 'num_edge_features': 64}
USING MODEL CONFIG: self._conf[model][freeze_track_motif] = False
USING MODEL CONFIG: self._conf[model][use_motif_timestep] = True
USING MODEL CONFIG: self._conf[diffuser][T] = 50
USING MODEL CONFIG: self._conf[diffuser][b_0] = 0.01
USING MODEL CONFIG: self._conf[diffuser][b_T] = 0.07
USING MODEL CONFIG: self._conf[diffuser][schedule_type] = linear
USING MODEL CONFIG: self._conf[diffuser][so3_type] = igso3
USING MODEL CONFIG: self._conf[diffuser][crd_scale] = 0.25
USING MODEL CONFIG: self._conf[diffuser][so3_schedule_type] = linear
USING MODEL CONFIG: self._conf[diffuser][min_b] = 1.5
USING MODEL CONFIG: self._conf[diffuser][max_b] = 2.5
USING MODEL CONFIG: self._conf[diffuser][min_sigma] = 0.02
USING MODEL CONFIG: self._conf[diffuser][max_sigma] = 1.5
USING MODEL CONFIG: self._conf[preprocess][sidechain_input] = False
USING MODEL CONFIG: self._conf[preprocess][motif_sidechain_input] = True
USING MODEL CONFIG: self._conf[preprocess][d_t1d] = 22
USING MODEL CONFIG: self._conf[preprocess][d_t2d] = 44
USING MODEL CONFIG: self._conf[preprocess][prob_self_cond] = 0.5
USING MODEL CONFIG: self._conf[preprocess][str_self_cond] = True
USING MODEL CONFIG: self._conf[preprocess][predict_previous] = False
[2023-06-05 10:23:52,919][rfdiffusion.inference.model_runners][INFO] - Loading checkpoint.
[2023-06-05 10:23:58,119][rfdiffusion.diffusion][INFO] - Using cached IGSO3.
Successful diffuser __init__
[2023-06-05 10:23:58,199][__main__][INFO] - Making design ./_0
[2023-06-05 10:23:58,411][rfdiffusion.inference.model_runners][INFO] - Using contig: ['10-40/a394-408/10-40']
With this beta schedule (linear schedule, beta_0 = 0.04, beta_T = 0.28), alpha_bar_T = 0.00013696048699785024
[2023-06-05 10:23:58,462][rfdiffusion.inference.model_runners][INFO] - Sequence init: -----------------------LNETHFSDDIEQQAD-----------------------------------
[2023-06-05 10:24:04,076][rfdiffusion.inference.utils][INFO] - Sampled motif RMSD: 0.21
[2023-06-05 10:24:04,094][rfdiffusion.inference.model_runners][INFO] - Timestep 50, input to next step: -----------------------LNETHFSDDIEQQAD-----------------------------------
[2023-06-05 10:24:05,277][rfdiffusion.inference.utils][INFO] - Sampled motif RMSD: 0.19
[2023-06-05 10:24:05,281][rfdiffusion.inference.model_runners][INFO] - Timestep 49, input to next step: -----------------------LNETHFSDDIEQQAD-----------------------------------
[2023-06-05 10:24:06,866][rfdiffusion.inference.utils][INFO] - Sampled motif RMSD: 0.15
[2023-06-05 10:24:06,870][rfdiffusion.inference.model_runners][INFO] - Timestep 48, input to next step: -----------------------LNETHFSDDIEQQAD-----------------------------------
[2023-06-05 10:24:08,469][rfdiffusion.inference.utils][INFO] - Sampled motif RMSD: 0.14
[2023-06-05 10:24:08,473][rfdiffusion.inference.model_runners][INFO] - Timestep 47, input to next step: -----------------------LNETHFSDDIEQQAD-----------------------------------
[2023-06-05 10:24:10,060][rfdiffusion.inference.utils][INFO] - Sampled motif RMSD: 0.12
[2023-06-05 10:24:10,064][rfdiffusion.inference.model_runners][INFO] - Timestep 46, input to next step: -----------------------LNETHFSDDIEQQAD-----------------------------------
...
[2023-06-05 10:25:03,386][rfdiffusion.inference.utils][INFO] - Sampled motif RMSD: 0.18
[2023-06-05 10:25:03,390][rfdiffusion.inference.model_runners][INFO] - Timestep 2, input to next step: -----------------------LNETHFSDDIEQQAD-----------------------------------
[2023-06-05 10:25:05,602][__main__][INFO] - Finished design in 1.12 minutes
[2023-06-05 10:25:05,602][__main__][INFO] - Making design ./_1
[2023-06-05 10:25:05,664][rfdiffusion.inference.model_runners][INFO] - Using contig: ['10-40/a394-408/10-40']
With this beta schedule (linear schedule, beta_0 = 0.04, beta_T = 0.28), alpha_bar_T = 0.00013696048699785024
[2023-06-05 10:25:05,694][rfdiffusion.inference.model_runners][INFO] - Sequence init: ----------------------LNETHFSDDIEQQAD--------------------
[2023-06-05 10:25:06,601][rfdiffusion.inference.utils][INFO] - Sampled motif RMSD: 0.22
[2023-06-05 10:25:06,604][rfdiffusion.inference.model_runners][INFO] - Timestep 50, input to next step: ----------------------LNETHFSDDIEQQAD--------------------
...
[2023-06-05 10:25:51,862][__main__][INFO] - Finished design in 0.77 minutes
[2023-06-05 10:25:51,863][__main__][INFO] - Making design ./_2
[2023-06-05 10:25:51,925][rfdiffusion.inference.model_runners][INFO] - Using contig: ['10-40/a394-408/10-40']
With this beta schedule (linear schedule, beta_0 = 0.04, beta_T = 0.28), alpha_bar_T = 0.00013696048699785024
[2023-06-05 10:25:51,955][rfdiffusion.inference.model_runners][INFO] - Sequence init: -------------------------------LNETHFSDDIEQQAD-------------
[2023-06-05 10:25:52,886][rfdiffusion.inference.utils][INFO] - Sampled motif RMSD: 0.20
[2023-06-05 10:25:52,889][rfdiffusion.inference.model_runners][INFO] - Timestep 50, input to next step: -------------------------------LNETHFSDDIEQQAD-------------
...
[2023-06-05 10:32:34,898][rfdiffusion.inference.utils][INFO] - Sampled motif RMSD: 0.13
[2023-06-05 10:32:34,902][rfdiffusion.inference.model_runners][INFO] - Timestep 2, input to next step: -------------------LNETHFSDDIEQQAD------------------------
[2023-06-05 10:32:36,721][__main__][INFO] - Finished design in 0.79 minutes

End the interactive session:

[user@cn3335 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job

Most jobs should be run as batch jobs.

Create a batch input file, e.g. rfdiffusion.sh:

#!/bin/sh

#SBATCH --job-name mb256
#SBATCH --cpus-per-task=8
#SBATCH --mem=20g
#SBATCH --gres=gpu:p100:1,lscratch:20
#SBATCH -p gpu

module load RFdiffusion
git clone https://github.com/RosettaCommons/RFdiffusion 
cd RFdiffusion
bash scripts/download_models.sh models
cp $RFDIFFUSION_DATA/* .
 python-rfd ./scripts/run_inference.py inference.output_prefix=./ inference.input_pdb=./sample.pdb 'contigmap.contigs=[10-40/a394-408/10-40]' +schedule_directory_path=./

Submit this job using the Slurm sbatch command.

sbatch rfdiffusion.sh