Where bindsight fits — positioning & landscape¶

A short, honest map of where bindsight sits in the protein-design / target-discovery ecosystem: who it's for, the gap it fills, and what's on the roadmap. If you're evaluating bindsight for your group, start here.

The gap bindsight fills¶

Two mature ecosystems sit side by side and barely talk to each other:

Genomics / transcriptomics (DESeq2, edgeR, Seurat, scanpy, TCGA, recount3) stops at "here are the interesting genes."
De novo protein / binder design (RFdiffusion, ProteinMPNN, BindCraft, BoltzGen, AlphaFold, Boltz-2) starts from "given a target structure…"

Every open binder-design tool we surveyed — BindCraft, BinderFlow, PXDesign, Latent-X — begins with a known target. The path from expression data to a designed, ranked, provenance-tracked binder candidate is built ad-hoc, per project, rarely reproducibly.

bindsight ships that bridge as one open, reproducible tool — and records the receipts all the way back to the patient cohort.

Who it's for¶

Audience	What bindsight gives you
Translational researchers	A free, reproducible "data → designed binder" path on a laptop, with free-GPU offload.
Clinical / cancer biologists	An audit trail from any binder candidate back to the cohort it came from.
Method developers	A held-out evaluation harness (rediscovery of known antigens) to benchmark new designers/validators behind a stable plugin interface.
Early-discovery teams	An open, extensible comparator you can plug proprietary designers into — no fork required.

How it relates to neighboring tools¶

bindsight is an orchestration + provenance layer, not a new model. It stands on, and credits, the best open tools in each step (see Acknowledgments). Its contribution is the connective tissue: the surfaceome/targetable-site filter (SURFACE-Bind), the multi-objective ranking, the cost-aware GPU offload, the failure taxonomy, and the PROV-O + RO-Crate provenance that makes a run citable and reproducible.

	Typical binder-design tool	bindsight
Input	Target structure	RNA-seq counts
Provenance	PDB + maybe a log	PROV-O JSON-LD + RO-Crate, audit trail to cohort
Hardware	HPC assumed	CPU laptop + free Colab/Modal/Kaggle offload
Cost-awareness	None	`--dry-run` estimates GPU $ before running
Negative results	Discarded	Catalogued (`failure_taxonomy.parquet`)
Citability	Code dump	DOI per release, JSON-Schema-validated outputs

Roadmap¶

v0.2.0 (now) — discovery half end-to-end on CPU; design + validation proven on a free GPU (bindsight's first real ERBB2 binders — see the designer benchmark); multi-page web UI live.
v0.3.0 — live (async, non-blocking) Modal/Colab job submission; BindCraft + BoltzGen plugins fully wired; scRNA-seq input.
v1.0.0 — JOSS submission + validation paper (blinded rediscovery of HER2/EGFR/MSLN/CLDN6).

Get involved¶

bindsight is AGPL-3.0-licensed and built in the open. If you run target discovery or binder design and want to compare notes — or you'd like to try it on your own cohort — open an issue or reach the author (@mikhaeelatefrizk, ORCID 0009-0006-1069-9558). Feedback from real workflows directly shapes the roadmap.