Track Awesome Computational Biology Updates Daily
Awesome list of computational biology.
🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor · 😺 inoue0426/awesome-computational-biology · ⭐ 113 · 🏷️ Miscellaneous
Feb 08, 2026
Pathway
- Reactome — Expert-curated, peer-reviewed pathway database with detailed reaction mechanisms.
- BioCyc — Collection of pathway/genome databases across thousands of organisms.
- SIGNOR — Database of causal signaling interactions and pathways.
- MSigDB (Molecular Signatures Database) — Curated gene sets derived from pathways and biological processes.
Protein
- PROTEIN DATA BANK (PDB) — 3D structures of proteins, nucleic acids, complexes.
- RCSB Protein Data Bank — Repository for structural data of biological molecules.
Disease
- DrugBank — Database of drugs and targets (University of Alberta).
Interaction / Drug-Gene Interaction
- Comparative Toxicogenomics Database — Chemical-gene interactions, chemical-disease and gene-disease associations, chemical-phenotype associations.
- SNAP — Dataset of drug-gene interactions.
- Therapeutics Data Commons — Datasets for drug-target, response, drug-drug interaction, etc.
Interaction / Drug (Cell Line) Response
- Genomics of Drug Sensitivity in Cancer (GDSC) — Drug sensitivity for ~1000 human cancer cell lines and hundreds of compounds.
- Cancer Cell Line Encyclopedia — Database of ~1000 cancer cell lines.
- CellMiner Cross Database (CellMinerCDB) — Integrates multiple cancer cell line databases.
Interaction / Chemical-Protein Interaction
- BindingDB — Compounds and target database.
- PDBBind — Binding affinity data for biomolecular complexes.
- CrossDocked2020 — Large-scale dataset for structure-based virtual screening.
Interaction / Protein-Protein Interaction
- BioGRID — Protein, genetic, and chemical interactions.
- HIPPIE — Human protein-protein interaction database.
Interaction / Knowledge Graph
- DRKG (⭐670) — Large-scale biological knowledge graph for drug discovery.
- Hetionet (⭐340) — Heterogeneous network integrating genes, diseases, drugs, pathways, and more.
- OpenBioLink (⭐157) — Benchmark datasets for biological knowledge graph completion.
- PrimeKG (⭐697) — Multi-modal precision medicine knowledge graph integrating clinical, genetic, and drug data.
API / Knowledge Graph
- PubMed E-utilities (esearch/efetch) — APIs for searching and retrieving biomedical literature from PubMed.
- NCBI E-utilities — Unified APIs for accessing NCBI databases (Gene, GEO, SRA, PubChem, etc).
- UniProt REST API — Programmatic access to protein sequence and functional annotation data.
- Ensembl REST API — API for genomic annotations, variants, genes, and comparative genomics.
- KEGG REST API — API for accessing KEGG pathways, compounds, genes, and reactions.
- ChEMBL Web Services — REST API for bioactive molecules, targets, and bioassays.
- Open Targets Platform API — API for target–disease associations integrating genetics, genomics, and drug data.
- ClinicalTrials.gov API — API for querying clinical trial metadata and results.
Drug Target Interaction / Knowledge Graph
- DTINet (⭐185) — Network-based framework integrating heterogeneous biological data for DTI prediction.
- DeepDTA (⭐291) — Deep learning model using CNNs on protein sequences and drug SMILES.
- GraphDTA (⭐292) — Graph neural network–based DTI prediction using molecular graphs.
- MolTrans (⭐224) — Transformer-based DTI model leveraging molecular substructures.
- DrugBAN (⭐140) — Bilinear attention network for interpretable DTI prediction.
Pre-trained Embedding / Knowledge Graph
- Evolutionary Scale Modeling (ESM) (⭐4k) — Protein embeddings.
Jan 07, 2026
Preprocessing Tools / Knowledge Graph
- ChatSpatial (⭐11) — MCP server for spatial transcriptomics analysis via natural language.
Foundation Models / Knowledge Graph
- scFoundation (⭐390) — Large-scale foundation model for single-cell gene expression, enabling multiple downstream tasks.
- scGPT (⭐1.5k) — Transformer-based foundation model pretrained on millions of single-cell profiles.
- BulkFormer (⭐41) — Foundation model for bulk RNA-seq data; learns general transcriptomic representations.
Jan 03, 2026
Preprocessing Tools / Knowledge Graph
- FlashDeconv (⭐10) — High-performance spatial transcriptomics deconvolution (~1M spots in ~3 min).
- Squidpy — Python library for spatial single-cell analysis.
Nov 17, 2024
Drug Response Prediction / Knowledge Graph
- MOFGCN (⭐6) — GCN + heterogeneous network.
- DeepDSC — Autoencoder + fully connected NN.
- DGDRP (⭐0) — Multi-view embedding neural network.
- DeepAEG (⭐3) — GNN embedding + attention mechanism.
Sep 01, 2024
Compound
- ZINC ligand discovery database — Free database of commercially-available compounds for virtual screening.
- MoleculeNet — Benchmark datasets for molecular machine learning.
Protein
- Critical Assessment of Structure Prediction (CASP) — Assessing methods for protein structure prediction.
- Uniclust — Clustered protein sequence databases.
- CATH database — Hierarchical classification of protein domain structures.
Genome
- 10x Genomics Dataset — Collection of single-cell datasets.
- The Genotype-Tissue Expression (GTEx) — Human gene expression and regulation resource.
- Dependency Map (DepMap) — CRISPR-Cas9 screens in cancer cell lines.
- Catalogue Of Somatic Mutations In Cancer (COSMIC) — Resource on somatic mutations in cancers.
- MGnify — Resource for metagenomic and metatranscriptomic data.
- JASPAR — Database of transcription factor binding profiles.
Clinical Trial / Knowledge Graph
- ClinicalTrials.gov — Privately and publicly funded clinical studies.
- ICD10 — International Classification of Diseases, 10th revision.
- EU Drug Regulating Authorities Clinical Trials DB (EudraCT) — European clinical trial database.
- MIMIC-IV — Freely accessible critical care database.
Aug 11, 2024
Compound
- Therapeutic Target Database — Drug-target, target-disease, and drug-disease datasets.
Aug 10, 2024
LLM for Biology / Knowledge Graph
- scPRINT (⭐138) — Pretrained on 50M cells for scRNA-seq denoising & zero imputation.
Jul 17, 2024
Interaction / Knowledge Graph
- Drug Mechanism Database (DrugMechDB) (⭐67) — Mechanisms of action from drug to disease.
Drug Response Prediction / Knowledge Graph
- drGAT (⭐1) — Attention-based model for drug response prediction with gene explainability.
LLM for Biology / Knowledge Graph
- GeneGPT (⭐420) — LLM for biomedical information, integrated with various APIs.
- GenePT (⭐308) — Foundation LLM for single-cell data.
Mar 17, 2024
Pre-trained Embedding / Knowledge Graph
- ChemBERTa-2 (⭐486) — Chemical embeddings & prediction.
LLM for Biology / Knowledge Graph
- AI4Chem/ChemLLM-7B-Chat — LLM for chemical & molecular science.
- BioGPT (⭐4.5k) — LLM for biomedical text generation.
Nov 29, 2023
Compound
- Drug Repurposing Hub — Collections of drug repurposing data (drug, MoA, target, etc).
Protein
- AlphaFold Protein Structure Database — 3D protein structure predictions.
Interaction / Protein-Protein Interaction
- STRING — PPI networks for multiple organisms.
Sep 07, 2023
Compound
- Rhea — Database of chemical reactions.
Jun 13, 2023
scRNA
- Single Cell Expression Atlas — Public database for single-cell RNA.
Pathway
- PathwayCommons — Database of pathways and interactions.
Genome
- cBioPortal — Cancer genomics database; aggregating many patient datasets.
Preprocessing Tools / Knowledge Graph
- Scanpy — Python library for scRNA-seq analysis.
- Seurat — R library for scRNA-seq analysis.
Apr 17, 2023
scRNA
- Gene Expression Omnibus — Public functional genomics database.
- Single Cell PORTAL — Public database for single-cell RNA.
Dec 29, 2022
Interaction / Drug (Cell Line) Response
- NCI60 — Focuses on 60 cancer cell lines and many drugs.
May 18, 2022
Compound
- ChEBI — Database focused on small chemical compounds.
May 15, 2022
Compound
- PubChem — One of the largest chemical databases (compounds, genes, and proteins).
- ChEMBL — Bioactive molecules with drug-like properties.
- ChemSpider — Chemical structure database.
- KEGG COMPOUND — Collection of small molecules and biopolymers.
- LIPID MAPS — Database of lipids.
Pathway
- KEGG PATHWAY — Collection of pathway maps.
- WikiPathways — Database of biological pathways.
Mass Spectra
- MassBank — Open source databases and tools for mass spectrometry reference spectra.
- MoNA MassBank of North America — Meta-database of metabolite mass spectra, metadata, and associated compounds.
Protein
- THE HUMAN PROTEIN ATLAS — Comprehensive human protein database (cells, tissues, organs).
- UniProt — Functional information on proteins.
Genome
- Human Genome Resources at NCBI — Database for genomics, proteomics, transcriptomics, and systems biology.
- GenBank — NCBI's database of genetic sequences.
- UCSC Genome Browser — UCSC's genome browser.
Disease
- KEGG DRUG — Comprehensive, approved drug information.
Preprocessing Tools / Knowledge Graph
- Chemistry Development Kit (⭐568) — Cheminformatics software & machine learning tools.
- RDKit (⭐3.3k) — Cheminformatics software & machine learning toolkit.
Drug Repurposing / Knowledge Graph
- DeepPurpose (⭐1.1k) — Deep learning library for drug repurposing.
Drug Target Interaction / Knowledge Graph
- NeoDTI (⭐77) — Library for drug-target interaction prediction.
Compound-Protein Interaction / Knowledge Graph
- MCPINN (⭐3) — Drug discovery via compound-protein interaction and machine learning.
- TransformerCPI (⭐152) — CPI prediction using Transformer.
Feb 22, 2022
Interaction / Drug-Gene Interaction
- DGIdb — Drug-gene interactions and the druggable genome.
Interaction / Chemical-Protein Interaction
- STITCH — Chemical-protein interactions.