Top 50 Awesome List

danielecook/Awesome-Bioinformatics

Miscellaneous  2 months ago  2.1k
A curated list of awesome Bioinformatics libraries and software.
View byDAY/WEEK/README
View on Github

Awesome Bioinformatics Awesome URL Check TOC

Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data. — Wikipedia

A curated list of awesome Bioinformatics software, resources, and libraries. Mostly command line based, and free or open-source. Please feel free to contribute!

Table of Contents


Package suites

Package suites gather software packages and installation tools for specific languages or platforms. We have some for bioinformatics software.

  • Bioperlstars271 - International association of users & developers of open source Perl tools for bioinformatics, genomics and life sciences. [ paper-2002 | web ]

  • Bioconductor - A plethora of tools for analysis and comprehension of high-throughput genomic data, including 1500+ software packages. [ paper-2004 | web ]

  • Biopythonstars3.2k - Freely available tools for biological computing in Python, with included cookbook, packaging and thorough documentation. Part of the Open Bioinformatics Foundation. Contains the very useful Entrez package for API access to the NCBI databases. [ paper-2009 | web ]

  • Bioconda - A channel for the conda package manager specializing in bioinformatics software. Includes a repository with 3000+ ready-to-install (with conda install) bioinformatics packages. [ paper-2018 | web ]

  • BioJulia - Bioinformatics and computational biology infastructure for the Julia programming language. [ web ]

  • Rust-Biostars1.1k - Rust implementations of algorithms and data structures useful for bioinformatics. [ paper-2016 ]

  • SeqAnstars289 - The modern C++ library for sequence analysis.

  • (Poly)merasestars315 - A Go library and command line utility for engineering organisms.

  • Biocamlstars113 - Biocaml aims to be a high-performance user-friendly library for Bioinformatics.

Data Tools

Downloading

Compressing

Data Processing

Command Line Utilities

  • Bioinformatics One Linersstars1.6k - Git repo of useful single line commands.
  • BioNodestars297 - Modular and universal bioinformatics, Bionode provides pipeable UNIX command line tools and JavaScript APIs for bioinformatics analysis workflows. [ web ]
  • bioSyntaxstars204 - Syntax Highlighting for Computational Biology file formats (SAM, VCF, GTF, FASTA, PDB, etc...) in vim/less/gedit/sublime. [ paper-2018 | web ]
  • CSVKitstars5.1k - Utilities for working with CSV/Tab-delimited files. [ web ]
  • csvtkstars756 - Another cross-platform, efficient, practical and pretty CSV/TSV toolkit. [ web ]
  • datamash - Data transformations and statistics. [ web ]
  • easy_qsubstars26 - Easily submitting PBS jobs with script template. Multiple input files supported.
  • GNU Parallel - General parallelizer that runs jobs in parallel on a single multi-core machine. Here are some example scripts using GNU Parallel. [ web ]
  • grabix - A wee tool for random access into BGZF files.
  • gsortstars33 - Sort genomic files according to a specified order.
  • tabixstars82 - Table file index. [ paper-2011 ]
  • wormtablestars25 - Write-once-read-many table for large datasets.
  • zindexstars577 - Create an index on a compressed text file.

Next Generation Sequencing

Workflow Managers

  • BigDataScriptstars89 - A cross-system scripting language for working with big data pipelines in computer systems of different sizes and capabilities. [ paper-2014 | web ]
  • Bpipestars207 - A small language for defining pipeline stages and linking them together to make pipelines. [ web ]
  • Common Workflow Languagestars1.4k - a specification for describing analysis workflows and tools that are portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments. [ web ]
  • Cromwellstars845 - A Workflow Management System geared towards scientific workflows. [ web ]
  • Galaxy - a popular open-source, web-based platform for data intensive biomedical research. Has several features, from data analysis to workflow management to visualization tools. [ paper-2018 | web ]
  • Nextflowstars1.8k (recommended) - A fluent DSL modelled around the UNIX pipe concept, that simplifies writing parallel and scalable pipelines in a portable manner. [ paper-2018 | web ]
  • Ruffusstars165 - Computation Pipeline library for python widely used in science and bioinformatics. [ paper-2010 | web ]
  • SciPipestars958 - Workflow library embedded in the Go programming language, focusing on supporting complex workflow constructs, compiling to a single binary, providing powerful file naming and comprehensive audit reports for every output [ paper-2019 | web ]
  • SeqWarestars27 - Hadoop Oozie-based workflow system focused on genomics data analysis in cloud environments. [ paper-2010 | web ]
  • Snakemake - A workflow management system in Python that aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment. [ paper-2018 | web ]
  • Workflow Descriptor Languagestars25 - Workflow standard developed by the Broad. [ web ]

Pipelines

  • Awesome-Pipelinestars5.1k - A list of pipeline resources.
  • Bactopiastars185 - A flexible pipeline, built with Nextflow, for the complete analysis of bacterial genomes. [ web ]
  • Bacannotstars54 - A generic but comprehensive bacterial annotation pipeline, built with Nextflow, with nice graphical options for investigating results. [ web ]
  • bcbio-nextgenstars896 - Batteries included genomic analysis pipeline for variant and RNA-Seq analysis, structural variant calling, annotation, and prediction. [ web ]
  • R-Peridotstars4 - Customizable pipeline for differential expression analysis with an intuitive GUI. [ web ]
  • ngs-preprocessstars21 - A pipeline for preprocessing short and long sequencing reads, built with Nextflow. [ web ]

Sequence Processing

Sequence Processing includes tasks such as demultiplexing raw read data, and trimming low quality bases.

Data Analysis

The following items allow for scalable genomic analysis by introducing specialized databases.

Sequence Alignment

Pairwise

Multiple Sequence Alignment

  • POA - Partial-Order Alignment for fast alignment and consensus of multiple homologous sequences. [ paper-2002 ]

Clustering

Quantification

  • Cufflinksstars266 - Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. [ paper-2010 ]
  • RSEMstars325 - A software package for estimating gene and isoform expression levels from RNA-Seq data. [ paper-2011 | web ]

Variant Calling

Structural variant callers

BAM File Utilities

VCF File Utilities

GFF BED File Utilities

Variant Simulation

Variant Prediction/Annotation

  • SIFTstars408 - Predicts whether an amino acid substitution affects protein function. [ paper-2003 | web ]
  • SnpEffstars161 - Genetic variant annotation and effect prediction toolbox. [ paper-2012 | web ]
  • Ensembl VEP - The VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. [ paper-2016 | web ]

Python Modules

Data

Tools

Assembly

  • SPAdesstars469 - SPAdes (St. Petersburg genome assembler) is an assembly toolkit containing various assembly pipelines and the de-facto standard for prokaryotic genome assemblies.
  • SKESAstars78 - SKESA is a de-novo sequence read assembler for microbial genomes. It uses conservative heuristics and is designed to create breaks at repeat regions in the genome. This leads to excellent sequence quality without significantly compromising contiguity.

Annotation

  • Prokkastars592 - Prokka: rapid prokaryotic genome annotation. Prokka is one of the most cited annotation command line tools for microbial genome annotations.
  • Baktastars210 - Bakta is a tool for the rapid & standardized annotation of bacterial genomes & plasmids. It provides dbxref-rich and sORF-including annotations in machine-readable JSON & bioinformatics standard file formats for automatic downstream analysis.

Long-read sequencing

Long-read Assembly

  • canustars533 - A single molecule sequence assembler for genomes large and small.
  • flyestars537 - De novo assembler for single molecule sequencing reads using repeat graphs.
  • hifiasmstars283 - A haplotype-resolved assembler for accurate Hifi reads.
  • wtdbg2stars453 - A fuzzy Bruijn graph approach to long noisy reads assembly

Visualization

Genome Browsers / Gene Diagrams

The following tools can be used to visualize genomic data or for constructing customized visualizations of genomic data including sequence data from DNA-Seq, RNA-Seq, and ChIP-Seq, variants, and more.

  • Squigglestars33 - Easy-to-use DNA sequence visualization tool that turns FASTA files into browser-based visualizations. [ paper-2018 | web ]
  • biodalliancestars218 - Embeddable genome viewer. Integration data from a wide variety of sources, and can load data directly from popular genomics file formats including bigWig, BAM, and VCF. [ paper-2011 | web ]
  • BioJSstars465 - BioJS is a library of over hundred JavaScript components enabling you to visualize and process data using current web technologies. [ paper-2014 | web ]
  • Circleatorstars41 - Flexible circular visualization of genome-associated data with BioPerl and SVG. [ paper-2014 ]
  • DNAismstars59 - Horizon chart D3-based JavaScript library for DNA data. [ paper-2016 | web ]
  • IGV jsstars498 - Java-based browser. Fast, efficient, scalable visualization tool for genomics data and annotations. Handles a large variety of formats. [ paper-2019 | web ]
  • Island Plotstars33 - D3 JavaScript based genome viewer. Constructs SVGs. [ paper-2015 ]
  • JBrowsestars434 - JavaScript genome browser that is highly customizable via plugins and track customizations. [ paper-2016 | web ]
  • PHATstars17 - Point and click, cross platform suite for analysing and visualizing next-generation sequencing datasets. [ paper-2018 | web ]
  • pileup.jsstars264 - JavaScript library that can be used to generate interactive and highly customizable web-based genome browsers. [ paper-2016 ]
  • scriblstars75 - JavaScript library for drawing canvas-based gene diagrams. [ paper-2012 | web ]
  • Lucid Align - A modern sequence alignment viewer. [ web ]

Database Access

Resources

Becoming a Bioinformatician

Bioinformatics on GitHub

Sequencing

  • Next-Generation Sequencing Technologies - Elaine Mardis (2014) [1:34:35] - Excellent (technical) overview of next-generation and third-generation sequencing technologies, along with some applications in cancer research.
  • Annotated bibliography of *Seq assays - List of ~100 papers on various sequencing technologies and assays ranging from transcription to transposable element discovery.
  • For all you seq... (PDF) (3456x5471) - Massive infographic by Illumina on illustrating how many sequencing techniques work. Techniques cover protein-protein interactions, RNA transcription, RNA-protein interactions, RNA low-level detection, RNA modifications, RNA structure, DNA rearrangements and markers, DNA low-level detection, epigenetics, and DNA-protein interactions. References included.

RNA-Seq

ChIP-Seq

YouTube Channels and Playlists

  • Current Topics in Genome Analysis 2016 - Excellent series of fourteen lectures given at NIH about current topics in genomics ranging from sequence analysis, to sequencing technologies, and even more translational topics such as genomic medicine.
  • GenomeTV - "GenomeTV is NHGRI's collection of official video resources from lectures, to news documentaries, to full video collections of meetings that tackle the research, issues and clinical applications of genomic research."
  • Leading Strand - Keynote lectures from Cold Spring Harbor Laboratory (CSHL) Meetings. More on The Leading Strand.
  • Genomics, Big Data and Medicine Seminar Series - "Our seminars are dedicated to the critical intersection of GBM, delving into 'bleeding edge' technology and approaches that will deeply shape the future."
  • Rafael Irizarry's Channel - Dr. Rafael Irizarry's lectures and academic talks on statistics for genomics.
  • NIH VideoCasting and Podcasting - "NIH VideoCast broadcasts seminars, conferences and meetings live to a world-wide audience over the Internet as a real-time streaming video." Not exclusively genomics and bioinformatics video but many great talks on domain specific use of bioinformatics and genomics.

Blogs

  • ACGT - Dr. Keith Bradnam writes about this "thoughts on biology, genomics, and the ongoing threat to humanity from the bogus use of bioinformatics acroynums."
  • Opiniomics - Dr. Mick Watson write on bioinformatics, genomes, and biology.
  • Bits of DNA - Dr. Lior Pachter writes review and commentary on computational biology.
  • it is NOT junk - Dr. Michael Eisen writes "a blog about genomes, DNA, evolution, open science, baseball and other important things"
  • #!/perl/bioinfo - The Computational and Structural Biology group at EEAD-CSIC writes, in Spanish and English, about ideas and code for plant genomics, computational and structural biology problems.

Miscellaneous

Online networking groups

License

CC0

ON THIS PAGE

  1. Awesome Bioinformatics Awesome URL Check TOC
  2. Table of Contents
  3. Package suites
  4. Data Tools
  5. Downloading
  6. Compressing
  7. Data Processing
  8. Command Line Utilities
  9. Next Generation Sequencing
  10. Workflow Managers
  11. Pipelines
  12. Sequence Processing
  13. Data Analysis
  14. Sequence Alignment
  15. Quantification
  16. Variant Calling
  17. BAM File Utilities
  18. VCF File Utilities
  19. GFF BED File Utilities
  20. Variant Simulation
  21. Variant Prediction/Annotation
  22. Python Modules
  23. Assembly
  24. Annotation
  25. Long-read sequencing
  26. Long-read Assembly
  27. Visualization
  28. Genome Browsers / Gene Diagrams
  29. Circos Related
  30. Database Access
  31. Resources
  32. Becoming a Bioinformatician
  33. Bioinformatics on GitHub
  34. Sequencing
  35. RNA-Seq
  36. ChIP-Seq
  37. YouTube Channels and Playlists
  38. Blogs
  39. Miscellaneous
  40. Online networking groups
  41. License
Last Checked At: 2022-09-21T15:16:29.465Z
Previous
sublimino/awesome-funny-markov
Next
hsiaoyi0504/awesome-cheminformatics

About

Track your favorite github awesome repo, not just star it. trackawesomelist.com provides website, newsletter, RSS for tracking the popular awesome list by daily and weekly.
Contact us: [email protected]
Track Awesome List - Track your favorite Github awesome repos, not just star them | Product Hunt

Subscribe

Subscribe to our weekly newsletter to receive the awesome updates! We never send spam and you can unsubscribe instantly with one click. Here's past issues.

Links

Follow us on TwitterSubscribe us on TelegramSubmit awesome list repoNewsletterDonateSitemap