# Track Awesome Python Data Science Updates Daily

Probably the best curated list of data science software in Python.

🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor · 😺 krzjoa/awesome-python-data-science · ⭐ 2.3K · 🏷️ Programming Languages

## May 10, 2024

Genetic Programming / Others

- PyGAD (⭐1.7k) - Genetic Algorithm in Python.

## May 06, 2024

Optimization / Others

- pymoo (⭐2k) - Multi-objective Optimization in Python.

- pycma (⭐1k) - Python implementation of CMA-ES.

## Oct 19, 2023

Machine Learning / General Purpose Machine Learning

- PyCaret (⭐8.4k) - An open-source, low-code machine learning library in Python.

Reinforcement Learning / Others

- DI-engine (⭐2.6k) - OpenDILab Decision AI Engine.

- Imitation (⭐1.1k) - Clean PyTorch implementations of imitation and reward learning algorithms.

## Oct 17, 2023

Computer Vision / Others

- PyTorch3D (⭐8.3k) - PyTorch3D is FAIR's library of reusable components for deep learning with 3D data.

- Decord (⭐1.6k) - An efficient video loader for deep learning with smart shuffling that's super easy to digest.

- MMEngine (⭐1k) - OpenMMLab Foundational Library for Training Deep Learning Models.

- LAVIS (⭐8.8k) - A One-stop Library for Language-Vision Intelligence.

Reinforcement Learning / Others

- MAgent2 (⭐183) - An engine for high performance multi-agent environments with very large numbers of agents, along with a set of reference environments.

Learning-to-Rank & Recommender Systems / Others

- LightFM (⭐4.6k) - A Python implementation of LightFM, a hybrid recommendation algorithm.

- Spotlight - Deep recommender models using PyTorch.

- Surprise (⭐6.2k) - A Python scikit for building and analyzing recommender systems.

- RecBole (⭐3.2k) - A unified, comprehensive and efficient recommendation library.

- allRank (⭐804) - allRank is a framework for training learning-to-rank neural models based on PyTorch.

- TensorFlow Recommenders (⭐1.8k) - A library for building recommender system models using TensorFlow.

- TensorFlow Ranking (⭐2.7k) - Learning to Rank in TensorFlow.

Deployment / NLP

- streamsync (⭐1.1k) - No-code in the front, Python in the back. An open-source framework for creating data apps.

- Vizro (⭐2.4k) - A toolkit for creating modular data visualization applications.

Conversion / Synthetic Data

- treelite (⭐712) - Universal model exchange and serialization format for decision tree forests.

## Sep 25, 2023

Automated Machine Learning / Others

- Auto-PyTorch (⭐2.3k) - Automatic architecture search and hyperparameter optimization for PyTorch.

Reinforcement Learning / Others

- PettingZoo (⭐2.4k) - An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities.

## Sep 24, 2023

Reinforcement Learning / Others

- Shimmy (⭐118) - An API conversion tool for popular external reinforcement learning environments.

- EnvPool (⭐1k) - C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.

## Sep 22, 2023

Deep Learning / JAX

- JAX (⭐28k) - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more.

- FLAX (⭐5.5k) - A neural network library for JAX that is designed for flexibility.

- Optax (⭐1.5k) - A gradient processing and optimization library for JAX.

Reinforcement Learning / Others

- rlpyt (⭐2.2k) - Reinforcement Learning in PyTorch.

- cleanrl (⭐4.5k) - High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG).

- Machin (⭐389) - A reinforcement library designed for pytorch.

- SKRL (⭐406) - Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Isaac Orbit and Omniverse Isaac Gym.

Graph Machine Learning / Others

- PyTorch Geometric Signed Directed (⭐114) - A signed/directed graph neural network extension library for PyTorch Geometric.

- StellarGraph (⭐2.9k) - Machine Learning on Graphs.

- Graph Nets (⭐5.3k) - Build Graph Nets in Tensorflow.

- TensorFlow GNN (⭐1.3k) - A library to build Graph Neural Networks on the TensorFlow platform.

- Auto Graph Learning (⭐1.1k) - An autoML framework & toolkit for machine learning on graphs.

- PyTorch-BigGraph (⭐3.4k) - Generate embeddings from large-scale graph-structured data.

- GreatX (⭐81) - A graph reliability toolbox based on PyTorch and PyTorch Geometric (PyG).

- Jraph (⭐1.3k) - A Graph Neural Network Library in Jax.

## Sep 21, 2023

Machine Learning / General Purpose Machine Learning

- Shogun (⭐3k) - Machine learning toolbox.

Machine Learning / Gradient Boosting

- NGBoost (⭐1.6k) - Natural Gradient Boosting for Probabilistic Prediction.

- TensorFlow Decision Forests (⭐651) - A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras.

Deep Learning / Others

- transformers (⭐125k) - State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Automated Machine Learning / Others

- AutoKeras (⭐9.1k) - AutoML library for deep learning.

Natural Language Processing / Others

- KerasNLP (⭐701) - Modular Natural Language Processing workflows with Keras.

Computer Vision / Others

- KerasCV (⭐950) - Industry-strength Computer Vision workflows with Keras.

Feature Engineering / General

- OpenFE (⭐670) - Automated feature generation with expert-level performance.

## Sep 20, 2023

Graph Machine Learning / Others

- dgl (⭐13k) - Python package built to ease deep learning on graph, on top of existing DL frameworks.

## Sep 18, 2023

Reinforcement Learning / Others

- Gymnasium (⭐5.8k) - An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym (⭐34k)).

- Stable Baselines3 (⭐8k) - A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.

- Tianshou (⭐7.4k) - An elegant PyTorch deep reinforcement learning library.

- Acme (⭐3.4k) - A library of reinforcement learning components and agents.

- Catalyst-RL (⭐46) - PyTorch framework for RL research.

- d3rlpy (⭐1.2k) - An offline deep reinforcement learning library.

Probabilistic Graphical Models / Others

- pyAgrum - A GRaphical Universal Modeler.

## Aug 24, 2023

Data Manipulation / Pipelines

- Hamilton (⭐1.4k) - A microframework for dataframe generation that applies Directed Acyclic Graphs specified by a flow of lazily evaluated Python functions.

## May 26, 2023

Quantum Computing / Synthetic Data

- qiskit (⭐4.6k) - Qiskit is an open-source SDK for working with quantum computers at the level of circuits, algorithms, and application modules.

## Feb 23, 2023

Optimization / Others

- Optuna (⭐9.7k) - A hyperparameter optimization framework.

Feature Engineering / General

- dirty_cat (⭐9) - Machine learning on dirty tabular data (especially: string-based variables for classifcation and regression).

- NitroFE (⭐106) - Moving window features.

Feature Engineering / Feature Selection

- zoofs (⭐236) - A feature selection library based on evolutionary algorithms.

## Jan 30, 2023

Data Manipulation / Data Frames

- polars (⭐26k) - A fast multi-threaded, hybrid-out-of-core DataFrame library.

## Jan 08, 2023

Deployment / NLP

- gradio (⭐29k) - Create UIs for your machine learning model in Python in 3 minutes.

## Dec 22, 2022

Data Validation / Synthetic Data

- great_expectations (⭐9.5k) - Always know what to expect from your data.

- pandera (⭐3k) - A lightweight, flexible, and expressive statistical data testing library.

- deepchecks (⭐3.4k) - Validation & testing of ML models and data during model development, deployment, and production.

- evidently (⭐4.7k) - Evaluate and monitor ML models from validation to production.

- TensorFlow Data Validation (⭐751) - Library for exploring and validating machine learning data.

## Dec 17, 2022

Deep Learning / PyTorch

- pytorch-lightning (⭐27k) - PyTorch Lightning is just organized PyTorch.

Model Explanation / Others

- dalex (⭐1.3k) - moDel Agnostic Language for Exploration and explanation.

Optimization / Others

- OR-Tools - An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi.

Feature Engineering / General

- sk-transformer (⭐8) - A collection of various pandas & scikit-learn compatible transformers for all kinds of preprocessing and feature engineering steps

Data Manipulation / Data Frames

- xarray (⭐3.4k) - Xarray combines the best features of NumPy and pandas for multidimensional data selection by supplementing numerical axis labels with named dimensions for more intuitive, concise, and less error-prone indexing routines.

Data Manipulation / Synthetic Data

- ydata-synthetic (⭐1.3k) - A package to generate synthetic tabular and time-series data leveraging the state-of-the-art generative models.

Experimentation / Synthetic Data

- mlflow (⭐17k) - Open source platform for the machine learning lifecycle.

- dvc (⭐13k) - Data Version Control | Git for Data & Models | ML Experiments Management.

Computations / Synthetic Data

- NumExpr (⭐2.1k) - A fast numerical expression evaluator for NumPy that comes with an integrated computing virtual machine to speed calculations up by avoiding memory allocation for intermediate results.

Quantum Computing / Synthetic Data

- cirq (⭐4.1k) - A python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits.

## Nov 16, 2022

Automated Machine Learning / Others

- AutoGluon (⭐7.1k) - AutoML for Image, Text, Tabular, Time-Series, and MultiModal Data.

Data Manipulation / Data-centric AI

- cleanlab (⭐8.7k) - The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

- snorkel (⭐5.7k) - A system for quickly generating training data with weak supervision.

- dataprep (⭐1.9k) - Collect, clean, and visualize your data in Python with a few lines of code.

## Aug 31, 2022

Optimization / Others

- sklearn-genetic-opt (⭐273) - Hyperparameters tuning and feature selection using evolutionary algorithms.

## Aug 24, 2022

Feature Engineering / General

- Feature Engine (⭐1.7k) - Feature engineering package with sklearn-like functionality.

## Aug 10, 2022

Probabilistic Graphical Models / Others

- pomegranate (⭐3.3k) - Probabilistic and graphical models for Python.

## Jul 29, 2022

Deep Learning / PyTorch

- ChemicalX (⭐700) - A PyTorch-based deep learning library for drug pair scoring.

Time Series / Others

- darts (⭐7.3k) - A python library for easy manipulation and forecasting of time series.

- statsforecast (⭐3.6k) - Lightning fast forecasting with statistical and econometric models.

- mlforecast (⭐722) - Scalable machine learning-based time series forecasting.

- neuralforecast (⭐2.5k) - Scalable machine learning-based time series forecasting.

- greykite (⭐1.8k) - A flexible, intuitive, and fast forecasting library next.

- Chaos Genius (⭐699) - ML powered analytics engine for outlier/anomaly detection and root cause analysis

Experimentation / Synthetic Data

- envd (⭐1.9k) - 🏕️ machine learning development environment for data science and AI/ML engineering teams.

## Jan 12, 2022

Machine Learning / General Purpose Machine Learning

- sklearn-expertsys (⭐484) - Highly interpretable classifiers for scikit learn.

## Dec 03, 2021

Time Series / Others

- sktime (⭐7.4k) - A unified framework for machine learning with time series.

- tslearn (⭐2.8k) - Machine learning toolkit dedicated to time-series data.

- tick (⭐468) - Module for statistical learning, with a particular emphasis on time-dependent modeling.

- Prophet (⭐18k) - Automatic Forecasting Procedure.

- PyFlux (⭐2.1k) - Open source time series library for Python.

- bayesloop (⭐136) - Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.

- luminol (⭐1.2k) - Anomaly Detection and Correlation library.

- dateutil - Powerful extensions to the standard datetime module

- maya (⭐3.4k) - makes it very easy to parse a string and for changing timezones

## Sep 02, 2021

Experimentation / Synthetic Data

- Neptune - A lightweight ML experiment tracking, results visualization, and management tool.

## Mar 25, 2021

Visualization / Interactive plots

- pyecharts (⭐14k) - Migrated from Echarts (⭐59k), a charting and visualization library, to Python's interactive visual drawing library.

## Jan 01, 2021

Model Explanation / Others

- Shapley (⭐210) - A data-driven framework to quantify the value of classifiers in a machine learning ensemble.

## Oct 13, 2020

Deep Learning / TensorFlow

- Keras - A high-level neural networks API running on top of TensorFlow.

Deep Learning / Others

- Tangent (⭐2.3k) - Source-to-Source Debuggable Derivatives in Pure Python.

- autograd (⭐6.8k) - Efficiently computes derivatives of numpy code.

- Caffe (⭐34k) - A fast open framework for deep learning.

- nnabla (⭐2.7k) - Neural Network Libraries by Sony.

## Sep 25, 2020

Reinforcement Learning / Others

- TF-Agents (⭐2.7k) - A library for Reinforcement Learning in TensorFlow.

Deployment / NLP

- fastapi - Modern, fast (high-performance), a web framework for building APIs with Python

- binder - Enable sharing and execute Jupyter Notebooks

Web Scraping / Synthetic Data

- Pattern (⭐8.7k): High level scraping for well-establish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization

## Jul 31, 2020

Visualization / Interactive plots

- Bokeh (⭐19k) - Interactive Web Plotting for Python.

- Altair - Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph

- bqplot (⭐3.6k) - Plotting library for IPython/Jupyter notebooks

Visualization / Automatic Plotting

- HoloViews (⭐2.6k) - Stop plotting your data - annotate your data and let it visualize itself.

- AutoViz (⭐1.6k): Visualize data automatically with 1 line of code (ideal for machine learning)

- SweetViz (⭐2.8k): Visualize and compare datasets, target values and associations, with one line of code.

Visualization / NLP

- pyLDAvis (⭐1.8k): Visualize interactive topic model

Data Manipulation / Data Frames

- pandas_profiling (⭐12k) - Create HTML profiling reports from pandas DataFrame objects

Web Scraping / Synthetic Data

- BeautifulSoup: The easiest library to scrape static websites for beginners

- Scrapy: Fast and extensible scraping library. Can write rules and create customized scraper without touching the core

- Selenium: Use Selenium Python API to access all functionalities of Selenium WebDriver in an intuitive way like a real user.

- twitterscraper (⭐2.4k): Efficient library to scrape Twitter

## Jul 25, 2020

Graph Machine Learning / Others

- pytorch_geometric_temporal (⭐2.5k) - Temporal Extension Library for PyTorch Geometric.

## Jul 23, 2020

Visualization / Map

- folium - Makes it easy to visualize data on an interactive open street map

- geemap (⭐3.2k) - Python package for interactive mapping with Google Earth Engine (GEE)

## Jul 21, 2020

Deployment / NLP

- streamlit - Make it easy to deploy the machine learning model

- datapane - A collection of APIs to turn scripts and notebooks into interactive reports.

## Jun 17, 2020

Machine Learning / General Purpose Machine Learning

- causalml (⭐4.8k) - Uplift modeling and causal inference with machine learning algorithms.

Data Manipulation / Data Frames

- vaex (⭐8.2k) - Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second.

## May 18, 2020

Graph Machine Learning / Others

- Little Ball of Fur (⭐692) - A library for sampling graph structured data.

## Jan 25, 2020

Graph Machine Learning / Others

- Karate Club (⭐2.1k) - An unsupervised machine learning library for graph-structured data.

## Nov 20, 2019

Deep Learning / PyTorch

- Catalyst (⭐3.2k) - High-level utils for PyTorch DL & RL research.

## Nov 10, 2019

Data Manipulation / Pipelines

- dopanda (⭐469) - Hints and tips for using pandas in an analysis environment.

## Oct 29, 2019

Optimization / Others

- scikit-opt (⭐4.9k) - Heuristic Algorithms for optimization.

## Oct 28, 2019

Data Manipulation / Data Frames

- pandas-log (⭐214) - A package that allows providing feedback about basic pandas operations and finds both business logic and performance issues.

## Oct 26, 2019

Visualization / Interactive plots

- plotly - A Python library that makes interactive and publication-quality graphs.

## Oct 06, 2019

Machine Learning / General Purpose Machine Learning

- hyperlearn (⭐1.6k) - 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels.

Natural Language Processing / Others

- spaCy - Industrial-Strength Natural Language Processing.

## Sep 24, 2019

Deep Learning / TensorFlow

- tensorpack (⭐6.3k) - A Neural Net Training Interface on TensorFlow.

## Sep 23, 2019

Reinforcement Learning / Others

- Dopamine (⭐10k) - A research framework for fast prototyping of reinforcement learning algorithms.

Statistics / NLP

- weightedcalcs (⭐104) - A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.

Distributed Computing / Synthetic Data

- PaddlePaddle (⭐22k) - PArallel Distributed Deep LEarning.

Evaluation / Synthetic Data

- sklearn-evaluation (⭐3) - Model evaluation made easy: plots, tables, and markdown reports.

## Sep 15, 2019

Statistics / NLP

- statsmodels (⭐9.6k) - Statistical modeling and econometrics in Python.

## Sep 05, 2019

Quantum Computing / Synthetic Data

- PennyLane (⭐2.1k) - Quantum machine learning, automatic differentiation, and optimization of hybrid quantum-classical computations.

## Sep 04, 2019

Deep Learning / TensorFlow

- keras-contrib (⭐1.6k) - Keras community contributions.

- Hyperas (⭐2.2k) - Keras + Hyperopt: A straightforward wrapper for a convenient hyperparameter.

- Elephas (⭐1.6k) - Distributed Deep learning with Keras & Spark.

- qkeras (⭐522) - A quantization deep learning library.

Reinforcement Learning / Others

- RLlib - Scalable Reinforcement Learning.

- TensorForce (⭐3.3k) - A TensorFlow library for applied reinforcement learning.

- TRFL (⭐3.1k) - TensorFlow Reinforcement Learning.

- keras-rl (⭐5.5k) - Deep Reinforcement Learning for Keras.

- garage (⭐1.8k) - A toolkit for reproducible reinforcement learning research.

- Horizon (⭐3.5k) - A platform for Applied Reinforcement Learning.

Graph Machine Learning / Others

- Spektral (⭐2.3k) - Deep learning on graphs.

## Sep 03, 2019

Distributed Computing / Synthetic Data

- Horovod (⭐14k) - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

- PySpark - Exposes the Spark programming model to Python.

- Veles (⭐903) - Distributed machine learning platform.

- Jubatus (⭐707) - Framework and Library for Distributed Online Machine Learning.

- DMTK (⭐2.7k) - Microsoft Distributed Machine Learning Toolkit.

- dask-ml (⭐882) - Distributed and parallel machine learning.

- Distributed (⭐1.5k) - Distributed computation in Python.

## Sep 02, 2019

Visualization / General Purposes

- chartify (⭐3.5k) - Python library that makes it easy for data scientists to create charts.

- physt (⭐127) - Improved histograms.

Visualization / Interactive plots

- animatplot (⭐405) - A python package for animating plots built on matplotlib.

## Aug 31, 2019

Machine Learning / General Purpose Machine Learning

- scikit-learn - Machine learning in Python.

- cuML (⭐3.9k) - RAPIDS Machine Learning Library.

- modAL (⭐2.1k) - Modular active learning framework for Python3.

- Sparkit-learn (⭐1.1k) - PySpark + scikit-learn = Sparkit-learn.

- MLxtend (⭐4.8k) - Extension and helper modules for Python's data analysis and machine learning libraries.

- Reproducible Experiment Platform (REP) (⭐681) - Machine Learning toolbox for Humans.

- scikit-multilearn (⭐904) - Multi-label classification for python.

- seqlearn (⭐679) - Sequence classification toolkit for Python.

- pystruct (⭐665) - Simple structured learning framework for Python.

- RuleFit (⭐394) - Implementation of the rulefit.

- metric-learn (⭐1.4k) - Metric learning algorithms in Python.

Machine Learning / Gradient Boosting

- XGBoost (⭐26k) - Scalable, Portable, and Distributed Gradient Boosting.

- LightGBM (⭐16k) - A fast, distributed, high-performance gradient boosting.

- CatBoost (⭐7.8k) - An open-source gradient boosting on decision trees library.

- ThunderGBM (⭐687) - Fast GBDTs and Random Forests on GPUs.

Machine Learning / Ensemble Methods

- ML-Ensemble - High performance ensemble learning.

- Stacking (⭐216) - Simple and useful stacking library written in Python.

- stacked_generalization (⭐117) - Library for machine learning stacking generalization.

- vecstack (⭐685) - Python package for stacking (machine learning technique).

Machine Learning / Imbalanced Datasets

- imbalanced-learn (⭐6.7k) - Module to perform under-sampling and over-sampling with various techniques.

- imbalanced-algorithms (⭐230) - Python-based implementations of algorithms for learning on imbalanced data.

Machine Learning / Random Forests

- rpforest (⭐222) - A forest of random projection trees.

- sklearn-random-bits-forest (⭐9) - Wrapper of the Random Bits Forest program written by (Wang et al., 2016).

- rgf_python (⭐371) - Python Wrapper of Regularized Greedy Forest.

Machine Learning / Kernel Methods

- pyFM (⭐917) - Factorization machines in python.

- fastFM (⭐1.1k) - A library for Factorization Machines.

- tffm (⭐783) - TensorFlow implementation of an arbitrary order Factorization Machine.

- scikit-rvm (⭐226) - Relevance Vector Machine implementation using the scikit-learn API.

- ThunderSVM (⭐1.5k) - A fast SVM Library on GPUs and CPUs.

Deep Learning / PyTorch

- PyTorch (⭐78k) - Tensors and Dynamic neural networks in Python with strong GPU acceleration.

- ignite (⭐4.5k) - High-level library to help with training neural networks in PyTorch.

- skorch (⭐5.6k) - A scikit-learn compatible neural network library that wraps PyTorch.

Deep Learning / TensorFlow

- TensorFlow (⭐182k) - Computation using data flow graphs for scalable machine learning by Google.

- TensorLayer (⭐7.3k) - Deep Learning and Reinforcement Learning Library for Researcher and Engineer.

- TFLearn (⭐9.6k) - Deep learning library featuring a higher-level API for TensorFlow.

- Sonnet (⭐9.7k) - TensorFlow-based neural network library.

- Polyaxon (⭐3.5k) - A platform that helps you build, manage and monitor deep learning models.

- tfdeploy (⭐352) - Deploy TensorFlow graphs for fast evaluation and export to TensorFlow-less environments running numpy.

- tensorflow-upstream (⭐678) - TensorFlow ROCm port.

- TensorFlow Fold (⭐1.8k) - Deep learning with dynamic computation graphs in TensorFlow.

- TensorLight (⭐11) - A high-level framework for TensorFlow.

- Mesh TensorFlow (⭐1.6k) - Model Parallelism Made Easier.

- Ludwig (⭐11k) - A toolbox that allows one to train and test deep learning models without the need to write code.

Deep Learning / MXNet

- MXNet (⭐21k) - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler.

- Gluon (⭐2.3k) - A clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet).

- Xfer (⭐250) - Transfer Learning library for Deep Neural Networks.

- MXNet (⭐29) - HIP Port of MXNet.

Automated Machine Learning / Others

- auto-sklearn (⭐7.4k) - An AutoML toolkit and a drop-in replacement for a scikit-learn estimator.

- TPOT (⭐9.5k) - AutoML tool that optimizes machine learning pipelines using genetic programming.

Natural Language Processing / Others

- torchtext (⭐3.4k) - Data loaders and abstractions for text and NLP.

- gluon-nlp (⭐2.5k) - NLP made easy.

- pyMorfologik (⭐18) - Python binding for Morfologik.

- skift (⭐234) - Scikit-learn wrappers for Python fastText.

- flair (⭐14k) - Very simple framework for state-of-the-art NLP.

Computer Audition / Others

- torchaudio (⭐2.4k) - An audio library for PyTorch.

Computer Vision / Others

- torchvision (⭐15k) - Datasets, Transforms, and Models specific to Computer Vision.

- gluon-cv (⭐5.8k) - Provides implementations of the state-of-the-art deep learning models in computer vision.

Graph Machine Learning / Others

- pytorch_geometric (⭐20k) - Geometric Deep Learning Extension Library for PyTorch.

Probabilistic Methods / Others

- pyro (⭐8.4k) - A flexible, scalable deep probabilistic programming library built on PyTorch.

- ZhuSuan - Bayesian Deep Learning.

- GPflow - Gaussian processes in TensorFlow.

- InferPy (⭐143) - Deep Probabilistic Modelling Made Easy.

- sklearn-bayes (⭐506) - Python package for Bayesian Machine Learning with scikit-learn API.

- skpro (⭐190) - Supervised domain-agnostic prediction framework for probabilistic modelling by The Alan Turing Institute.

- PyVarInf (⭐355) - Bayesian Deep Learning methods with Variational Inference for PyTorch.

- GPyTorch (⭐3.4k) - A highly efficient and modular implementation of Gaussian Processes in PyTorch.

- sklearn-crfsuite (⭐423) - A scikit-learn-inspired API for CRFsuite.

Model Explanation / Others

- Contrastive Explanation (⭐45) - Contrastive Explanation (Foil Trees).

- yellowbrick (⭐4.2k) - Visual analysis and diagnostic tools to facilitate machine learning model selection.

- scikit-plot (⭐2.4k) - An intuitive library to add plotting functionality to scikit-learn objects.

- shap (⭐22k) - A unified approach to explain the output of any machine learning model.

- Lime (⭐11k) - Explaining the predictions of any machine learning classifier.

- FairML (⭐356) - FairML is a python toolbox auditing the machine learning models for bias.

- model-analysis (⭐1.2k) - Model analysis tools for TensorFlow.

- themis-ml (⭐121) - A library that implements fairness-aware machine learning algorithms.

- treeinterpreter (⭐737) - Interpreting scikit-learn's decision tree and random forest predictions.

Genetic Programming / Others

- gplearn (⭐1.5k) - Genetic Programming in Python.

- karoo_gp (⭐156) - A Genetic Programming platform for Python with GPU support.

- sklearn-genetic (⭐315) - Genetic feature selection module for scikit-learn.

Optimization / Others

- BoTorch (⭐3k) - Bayesian optimization in PyTorch.

- hyperopt-sklearn (⭐1.5k) - Hyper-parameter optimization for sklearn.

- sklearn-deap (⭐758) - Use evolutionary algorithms instead of gridsearch in scikit-learn.

- sigopt_sklearn (⭐75) - SigOpt wrappers for scikit-learn methods.

- GPflowOpt (⭐265) - Bayesian Optimization using GPflow.

Feature Engineering / General

- skl-groups (⭐41) - A scikit-learn addon to operate on set/"group"-based features.

- Feature Forge (⭐382) - A set of tools for creating and testing machine learning features.

- few (⭐50) - A feature engineering wrapper for sklearn.

- scikit-mdr (⭐125) - A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.

- tsfresh (⭐8.1k) - Automatic extraction of relevant features from time series.

Feature Engineering / Feature Selection

- scikit-feature (⭐1.5k) - Feature selection repository in Python.

- boruta_py (⭐1.4k) - Implementations of the Boruta all-relevant feature selection method.

- BoostARoota (⭐211) - A fast xgboost feature selection algorithm.

- scikit-rebate (⭐398) - A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

Statistics / NLP

- pandas_summary (⭐493) - Extension to pandas dataframes describe function.

- Pandas Profiling (⭐12k) - Create HTML profiling reports from pandas DataFrame objects.

- Alphalens (⭐3.1k) - Performance analysis of predictive (alpha) stock factors.

Data Manipulation / Data Frames

- datatable (⭐1.8k) - Data.table for Python.

- cuDF (⭐7.3k) - GPU DataFrame Library.

- blaze (⭐3.2k) - NumPy and pandas interface to Big Data.

- pandasql (⭐1.3k) - Allows you to query pandas DataFrames using SQL syntax.

- pandas-gbq (⭐421) - pandas Google Big Query.

- pysparkling (⭐261) - A pure Python implementation of Apache Spark's RDD and DStream interfaces.

- modin (⭐9.5k) - Speed up your pandas workflows by changing a single line of code.

Data Manipulation / Pipelines

- pandas-ply (⭐198) - Functional data manipulation for pandas.

- Dplython (⭐762) - Dplyr for Python.

- sklearn-pandas (⭐2.8k) - pandas integration with sklearn.

- pyjanitor (⭐1.3k) - Clean APIs for data cleaning.

Experimentation / Synthetic Data

- Sacred (⭐4.2k) - A tool to help you configure, organize, log, and reproduce experiments.

- Ax (⭐2.3k) - Adaptive Experimentation Platform.

Computations / Synthetic Data

- Dask (⭐12k) - Parallel computing with task scheduling.

Spatial Analysis / Synthetic Data

- GeoPandas (⭐4.2k) - Python tools for geographic data.

## Aug 30, 2019

Model Explanation / Others

- Auralisation (⭐39) - Auralisation of learned features in CNN (for audio).

- CapsNet-Visualization (⭐394) - A visualization of the CapsNet layers to better understand how it works.

- lucid (⭐4.6k) - A collection of infrastructure and tools for research in neural network interpretability.

- Netron (⭐26k) - Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks).

- FlashLight - Visualization Tool for your NeuralNetwork.

- tensorboard-pytorch (⭐7.8k) - Tensorboard for PyTorch (and chainer, mxnet, numpy, ...).

Data Manipulation / Data Frames

- swifter (⭐2.5k) - A package that efficiently applies any function to a pandas dataframe or series in the fastest available manner.

Data Manipulation / Pipelines

- meza (⭐411) - A Python toolkit for processing tabular data.

## Aug 27, 2019

Machine Learning / General Purpose Machine Learning

- xLearn (⭐3.1k) - High Performance, Easy-to-use, and Scalable Machine Learning Package.

- mlpack (⭐4.8k) - A scalable C++ machine learning library (Python bindings).

- dlib (⭐13k) - Toolkit for making real-world machine learning and data analysis applications in C++ (Python bindings).

- pyGAM (⭐838) - Generalized Additive Models in Python.

Machine Learning / Kernel Methods

- liquidSVM (⭐64) - An implementation of SVMs.

Automated Machine Learning / Others

- MLBox (⭐1.5k) - A powerful Automated Machine Learning python library.

Natural Language Processing / Others

- NLTK (⭐13k) - Modules, data sets, and tutorials supporting research and development in Natural Language Processing.

- CLTK (⭐820) - The Classical Language Toolkik.

- gensim - Topic Modelling for Humans.

- Phonemizer (⭐1.1k) - Simple text-to-phonemes converter for multiple languages.

Computer Audition / Others

- librosa (⭐6.7k) - Python library for audio and music analysis.

- Yaafe (⭐241) - Audio features extraction.

- aubio (⭐3.2k) - A library for audio and music analysis.

- Essentia (⭐2.7k) - Library for audio and music analysis, description, and synthesis.

- LibXtract (⭐224) - A simple, portable, lightweight library of audio feature extraction functions.

- Marsyas (⭐392) - Music Analysis, Retrieval, and Synthesis for Audio Signals.

- muda (⭐227) - A library for augmenting annotated audio data.

- madmom (⭐1.2k) - Python audio and music signal processing library.

Computer Vision / Others

- OpenCV (⭐76k) - Open Source Computer Vision Library.

- scikit-image (⭐5.9k) - Image Processing SciKit (Toolbox for SciPy).

- imgaug (⭐14k) - Image augmentation for machine learning experiments.

- imgaug_extension - Additional augmentations for imgaug.

- Augmentor (⭐5k) - Image augmentation library in Python for machine learning.

- albumentations (⭐13k) - Fast image augmentation library and easy-to-use wrapper around other libraries.

Probabilistic Graphical Models / Others

- pgmpy (⭐2.6k) - A python library for working with Probabilistic Graphical Models.

Probabilistic Methods / Others

- PyMC (⭐8.2k) - Bayesian Stochastic Modelling in Python.

- PyStan (⭐315) - Bayesian inference using the No-U-Turn sampler (Python interface).

- emcee (⭐1.4k) - The Python ensemble sampling toolkit for affine-invariant MCMC.

- hsmmlearn (⭐75) - A library for hidden semi-Markov models with explicit durations.

- pyhsmm (⭐544) - Bayesian inference in HSMMs and HMMs.

Model Explanation / Others

- Alibi (⭐2.3k) - Algorithms for monitoring and explaining machine learning models.

- anchor (⭐785) - Code for "High-Precision Model-Agnostic Explanations" paper.

- aequitas (⭐633) - Bias and Fairness Audit Toolkit.

- ELI5 (⭐2.7k) - A library for debugging/inspecting machine learning classifiers and explaining their predictions.

- L2X (⭐123) - Code for replicating the experiments in the paper
*Learning to Explain: An Information-Theoretic Perspective on Model Interpretation*.

- PDPbox (⭐824) - Partial dependence plot toolbox.

- PyCEbox (⭐163) - Python Individual Conditional Expectation Plot Toolbox.

- Skater - Python Library for Model Interpretation.

- AI Explainability 360 (⭐1.5k) - Interpretability and explainability of data and machine learning models.

Genetic Programming / Others

- DEAP (⭐5.6k) - Distributed Evolutionary Algorithms in Python.

- monkeys (⭐120) - A strongly-typed genetic programming framework for Python.

Optimization / Others

- Spearmint (⭐1.5k) - Bayesian optimization.

- SMAC3 (⭐1k) - Sequential Model-based Algorithm Configuration.

- Optunity (⭐414) - Is a library containing various optimizers for hyperparameter tuning.

- hyperopt (⭐7.1k) - Distributed Asynchronous Hyperparameter Optimization in Python.

- SafeOpt (⭐131) - Safe Bayesian Optimization.

- scikit-optimize (⭐2.7k) - Sequential model-based optimization with a
`scipy.optimize`

interface.

- Solid (⭐574) - A comprehensive gradient-free optimization framework written in Python.

- PySwarms (⭐1.2k) - A research toolkit for particle swarm optimization in Python.

- Platypus (⭐541) - A Free and Open Source Python Library for Multiobjective Optimization.

- POT (⭐2.3k) - Python Optimal Transport library.

- Talos (⭐1.6k) - Hyperparameter Optimization for Keras Models.

- nlopt (⭐1.7k) - Library for nonlinear optimization (global and local, constrained or unconstrained).

Feature Engineering / General

- Featuretools (⭐7k) - Automated feature engineering.

Visualization / General Purposes

- Matplotlib (⭐19k) - Plotting with Python.

- seaborn (⭐12k) - Statistical data visualization using matplotlib.

- prettyplotlib (⭐1.7k) - Painlessly create beautiful matplotlib plots.

- python-ternary (⭐698) - Ternary plotting library for Python with matplotlib.

- missingno (⭐3.8k) - Missing data visualization module for Python.

Statistics / NLP

- scikit-posthocs (⭐323) - Pairwise Multiple Comparisons Post-hoc Tests.

Data Manipulation / Data Frames

- pandas - Powerful Python data analysis toolkit.

- Arctic (⭐3k) - High-performance datastore for time series and tick data.

- xpandas (⭐26) - Universal 1d/2d data containers with Transformers .functionality for data analysis by The Alan Turing Institute.

Data Manipulation / Pipelines

- pdpipe (⭐716) - Sasy pipelines for pandas DataFrames.

- SSPipe - Python pipe (|) operator with support for DataFrames and Numpy, and Pytorch.

- Dataset (⭐196) - Helps you conveniently work with random or sequential batches of your data and define data processing.

- Prodmodel (⭐57) - Build system for data science pipelines.

Evaluation / Synthetic Data

- recmetrics (⭐558) - Library of useful metrics and plots for evaluating recommender systems.

- Metrics (⭐1.6k) - Machine learning evaluation metric.

- AI Fairness 360 (⭐2.3k) - Fairness metrics for datasets and ML models, explanations, and algorithms to mitigate bias in datasets and models.

Computations / Synthetic Data

- numpy - The fundamental package needed for scientific computing with Python.

- bottleneck (⭐1k) - Fast NumPy array functions written in C.

- CuPy (⭐7.8k) - NumPy-like API accelerated with CUDA.

- scikit-tensor (⭐401) - Python library for multilinear algebra and tensor factorizations.

- numdifftools (⭐247) - Solve automatic numerical differentiation problems in one or more variables.

- quaternion (⭐594) - Add built-in support for quaternions to numpy.

- adaptive (⭐1.1k) - Tools for adaptive and parallel samping of mathematical functions.

Spatial Analysis / Synthetic Data

- PySal (⭐1.3k) - Python Spatial Analysis Library.

Quantum Computing / Synthetic Data

- QML (⭐193) - A Python Toolkit for Quantum Machine Learning.

Conversion / Synthetic Data

- sklearn-porter (⭐1.3k) - Transpile trained scikit-learn estimators to C, Java, JavaScript, and others.

- ONNX (⭐17k) - Open Neural Network Exchange.

- MMdnn (⭐5.8k) - A set of tools to help users inter-operate among different deep learning frameworks.

## Dec 22, 2017

Optimization / Others

- Bayesian Optimization (⭐7.5k) - A Python implementation of global optimization with gaussian processes.

Statistics / NLP

- stockstats (⭐1.2k) - Supply a wrapper
`StockDataFrame`

based on the`pandas.DataFrame`

with inline stock statistics/indicators support.