How to design binders on Colab (the real recipe)¶

Step-by-step: how to take the bindsight discover output (target structures + epitopes) and produce real, designed binder PDBs on free or paid Google Colab. No GPU on your laptop required.

What you need before starting¶

A finished bindsight discover run (e.g. runs/luad_v01/).
A Google account (Colab is free; Colab Pro+ is $50/month and gives A100 access).
About 15 minutes (T4) or 5 minutes (A100) of attention.

If you haven't run discover yet:

bindsight discover examples/demo/config.yaml --out runs/demo
# or
bindsight demo

Step 1 — Pick a target¶

Open the epitopes.parquet from your run:

import pandas as pd
e = pd.read_parquet("runs/demo/epitopes/epitopes.parquet")
print(e[["symbol", "uniprot_id", "structure_path"]].head())

Pick one row. You'll need its uniprot_id, the structure_path (mmCIF file), and a list of "hotspot" residues to design against. If your run had SURFACE-Bind populated, hotspots are in the residues column. If not, you can pick by inspecting the structure in PyMOL or NGL — surface residues with small side chains and good solvent exposure tend to make good hotspots.

For HER2 (P04626), well-known hotspots are around the ECD subdomain II / IV interfaces — residues 244–267 (subdomain II) or 575–613 (subdomain IV).

Step 2 — Open the canonical Colab notebook¶

Use the ColabDesign / dl_binder_design notebook as your starting point. It runs RFdiffusion + ProteinMPNN end-to-end and is maintained by the community (the same group that develops AlphaFold's upstream MSA pipeline).

Alternative: the BindCraft Colab which is one-shot but needs A100 (≥32 GB). Use this if you have Colab Pro+.

Step 3 — Configure the notebook with your target¶

In the ColabDesign diffusion notebook:

Upload your target structure. Drag your target's mmCIF (its path is in the structure_path column of runs/demo/epitopes/epitopes.parquet) into the Colab file browser. Note its filename (e.g. AF-P04626-F1-model_v4.cif).

Set the inputs cell:

pdb = "AF-P04626-F1-model_v4.cif"        # target structure
target_chain = "A"
binder_length = 80                        # 50–150 typical
hotspot_residues = "A244,A245,A246,A247"  # hotspots from step 1
num_designs = 5                           # 5 on T4, 50 on A100

Run all cells. The pipeline will:
Install RFdiffusion (~3 min, cached after first install)
Install ProteinMPNN (~30 s)
Run RFdiffusion (~30 s/design on A100; ~2 min/design on T4)
Run ProteinMPNN to design sequences (~5 s/design)
Output: PDB files in outputs/ with the binders modeled

Step 4 — Validate with Boltz-2¶

Add a new cell after ProteinMPNN finishes:

!pip install -q boltz==2.* 2>/dev/null

from pathlib import Path
import yaml

# Build a Boltz-2 input YAML for each design
for pdb in Path("outputs").glob("*.pdb"):
    # Extract binder sequence from the PDB
    # ... (use Bio.PDB or simple parsing)
    cfg = {
        "sequences": [
            {"protein": {"id": "T", "sequence": target_seq}},
            {"protein": {"id": "B", "sequence": binder_seq}},
        ],
        "properties": [{"affinity": {"binder": "B"}}],
    }
    cfg_path = pdb.with_suffix(".yaml")
    cfg_path.write_text(yaml.safe_dump(cfg))
    !boltz predict {cfg_path} --use_msa_server --out_dir boltz_out

This gives you an iPTM and a predicted affinity per design. Sort by either to rank.

Step 5 — Bring the results back¶

Tarball the outputs/ and boltz_out/ directories on Colab:

!tar -czf binders.tar.gz outputs boltz_out
from google.colab import files
files.download("binders.tar.gz")

Drop the results tarball into your local runs/demo/design/ (creating the directory if needed). bindsight validate then materialises validate/validated.parquet from it and bindsight rank ranks the binders automatically.

You can also inspect the binders manually:

mkdir -p runs/demo/design && tar -xzf binders.tar.gz -C runs/demo/design/
ls runs/demo/design/outputs/      # designed binder PDBs

Open them in PyMOL, NGL, or ChimeraX.

Cost expectations¶

Tier	GPU	Designs you can run	Approx wall time	Cost
Colab free	T4 (16 GB)	5–10	30–60 min	$0
Colab Pro	T4 / V100	20–50	1–2 hr	$10/mo
Colab Pro+	A100 (40 GB)	50–200	30 min	$50/mo
Modal	A100 (40 GB)	50–200	20 min	~$3

The bindsight design --backend modal --dry-run command gives you a precise estimate for your specific config:

bindsight design runs/demo --backend modal --designer rfdiff_mpnn \
    --trajectories 50 --dry-run
# → Cost estimate panel shows GPU-hours and USD

Troubleshooting¶

Symptom	Likely cause	Fix
RFdiffusion install fails on Colab	weights download from IPD's server timed out	re-run the install cell; sometimes you need to wait for a less-loaded time of day
OOM on T4	binder too long or target too big	reduce `binder_length` to ≤100, or upgrade to A100
Boltz-2 install fails	torch version mismatch	restart Colab runtime and re-install in a fresh runtime
All designs look the same	RFdiffusion converged on one minimum	increase `num_designs` and add `noise_scale=0.3`

Why we don't auto-launch Colab from the CLI¶

Google's Colab API doesn't let third-party apps spin up free-tier notebooks without OAuth (and rate-limits the OAuth flow heavily). Modal is the right backend for "no clicks, runs on a schedule" workflows; Colab is the right backend for "free GPU and I'm willing to click two buttons." bindsight supports both — pick per command via --backend.

The one-command versions¶

bindsight design writes this notebook for you (and the headless backends run it for you end-to-end):

# Colab: writes a ready-to-run notebook per target (the manual recipe above)
bindsight design runs/demo --backend colab --designer rfdiff_mpnn

# Headless GPU: runs RFdiffusion → ProteinMPNN → Boltz-2 and pulls results back
bindsight design   runs/demo --backend modal        --designer rfdiff_mpnn
bindsight design   runs/demo --backend local_docker --designer rfdiff_mpnn  # your GPU
bindsight validate runs/demo
bindsight rank     runs/demo

The Colab notebook itself is just a thin wrapper over the same executor (bindsight.runners.job_exec) the headless backends run, so the result is identical whichever path you choose.