Top 50 Awesome List

krzjoa/awesome-python-data-science

Programming Languages  21 days ago  1.5k
Probably the best curated list of data science software in Python.
View byDAY/WEEK/README
View on Github

Aug 31st

Optimization

  • sklearn-genetic-optstars151 - Hyperparameters tuning and feature selection, using evolutionary algorithms. sklearn
  • Aug 24th

    Feature Engineering

    General

  • Feature Enginestars1k - Feature engineering package with sklearn like functionality. sklearn
  • Aug 10th

    Probabilistic Methods

  • pomegranatestars3k - Probabilistic and graphical models for Python. GPU accelerated
  • Jul 29th

    Time Series

  • dartsstars4.7k - A python library for easy manipulation and forecasting of time series.
  • statsforecaststars1.5k - Lightning fast forecasting with statistical and econometric models.
  • mlforecaststars84 - Scalable machine learning based time series forecasting.
  • neuralforecaststars880 - Scalable machine learning based time series forecasting.
  • greykitestars1.6k - A flexible, intuitive and fast forecasting librarynext.
  • Chaos Geniusstars472 - ML powered analytics engine for outlier/anomaly detection and root cause analysis
  • Experimentation

  • envdstars1k - 🏕️ machine learning development environment for data science and AI/ML engineering teams.
  • Deep Learning

    PyTorch

  • ChemicalXstars567 - A PyTorch based deep learning library for drug pair scoring. PyTorch based/compatible
  • Jan 12th

    Machine Learning

    General Purpouse Machine Learning

  • sklearn-expertsysstars481 - Highly interpretable classifiers for scikit learn. sklearn
  • Deepchecksstars2k - Validation & testing of ML models and data during model development, deployment, and production. sklearn
  • Dec 3rd, 2021

    Time Series

  • sktimestars5.7k - A unified framework for machine learning with time series. sklearn
  • tslearnstars2.2k - Machine learning toolkit dedicated to time-series data. sklearn
  • tickstars400 - Module for statistical learning, with a particular emphasis on time-dependent modelling. sklearn
  • Prophetstars14.9k - Automatic Forecasting Procedure.
  • PyFluxstars2k - Open source time series library for Python.
  • bayesloopstars117 - Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.
  • luminolstars1.1k - Anomaly Detection and Correlation library.
  • dateutil - Powerful extensions to the standard datetime module
  • mayastars3.4k - makes it very easy to parse a string and for changing timezones
  • Sep 2nd, 2021

    Experimentation

  • Neptune - A lightweight ML experiment tracking, results visualization and management tool.
  • Mar 25th, 2021

    Visualization

    Interactive plots

  • pyechartsstars12.7k - Migrated from Echartsstars52.5k, a charting and visualization library, to Python's interactive visual drawing library.pyecharts echarts
  • Jan 1st, 2021

    Model Explanation

  • Shapleystars181 - A data-driven framework to quantify the value of classifiers in a machine learning ensemble.
  • Oct 13th, 2020

    Deep Learning

    Others

  • Tangentstars2.2k - Source-to-Source Debuggable Derivatives in Pure Python.
  • autogradstars6k - Efficiently computes derivatives of numpy code.
  • Myiastars454 - Deep Learning framework (pre-alpha).
  • nnablastars2.6k - Neural Network Libraries by Sony.
  • Caffestars32.9k - A fast open framework for deep learning.
  • hipCaffestars126 - The HIP port of Caffe. Possible to run on AMD GPU
  • Deep Learning

    TensorFlow

  • Keras - A high-level neural networks API running on top of TensorFlow. Keras compatible
  • Sep 25th, 2020

    Web Scraping

  • Patternstars8.3k: High level scraping for well-establish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization
  • Deployment

  • binder - Enable sharing and execute Jupyter Notebooks
  • fastapi - Modern, fast (high-performance), web framework for building APIs with Python
  • Reinforcement Learning

  • TF-Agentsstars2.4k - A library for Reinforcement Learning in TensorFlow. sklearn
  • Jul 31st, 2020

    Web Scraping

  • BeautifulSoup: The easiest library to scrape static websites for beginners
  • Scrapy: Fast and extensible scraping library. Can write rules and create customized scraper without touching the coure
  • Selenium: Use Selenium Python API to access all functionalities of Selenium WebDriver in an intuitive way like a real user.
  • twitterscraperstars2.1k: Efficient library to scrape twitter
  • Data Manipulation

    Data Containers

  • pandas_profilingstars9.6k - Create HTML profiling reports from pandas DataFrame objects
  • Visualization

    Interactive plots

  • Bokehstars16.7k - Interactive Web Plotting for Python.
  • Altair - Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
  • bqplotstars3.3k - Plotting library for IPython/Jupyter notebooks
  • Visualization

    Automatic Plotting

  • HoloViewsstars2.3k - Stop plotting your data - annotate your data and let it visualize itself.
  • AutoVizstars912: Visualize data automatically with 1 line of code (ideal for machine learning)
  • SweetVizstars2.2k: Visualize and compare datasets, target values and associations, with one line of code.
  • Visualization

    NLP

  • pyLDAvisstars1.6k: Visualize interactive topic model
  • Jul 25th, 2020

    Deep Learning

    PyTorch

  • pytorch_geometric_temporalstars1.7k - Temporal Extension Library for PyTorch Geometric. PyTorch based/compatible
  • Jul 23rd, 2020

    Visualization

    Map

  • folium - Makes it easy to visualize data on an interactive open street map
  • geemapstars2.3k - Python package for interactive mapping with Google Earth Engine (GEE)
  • Jul 21st, 2020

    Deployment

  • datapane - A collection of APIs to turn scripts and notebooks into interactive reports.
  • streamlit - Make it easy to deploy machine learning model
  • Jun 17th, 2020

    Data Manipulation

    Data Containers

  • vaexstars7.3k - Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second.
  • Machine Learning

    General Purpouse Machine Learning

  • causalmlstars3.5k - Uplift modeling and causal inference with machine learning algorithms. sklearn
  • May 18th, 2020

    Machine Learning

    General Purpouse Machine Learning

  • Little Ball of Furstars611 - A library for sampling graph structured data.
  • Jan 25th, 2020

    Machine Learning

    General Purpouse Machine Learning

  • Karate Clubstars1.7k - An unsupervised machine learning library for graph structured data.
  • Nov 20th, 2019

    Deep Learning

    PyTorch

  • Catalyststars3k - High-level utils for PyTorch DL & RL research. PyTorch based/compatible
  • Nov 10th, 2019

    Data Manipulation

    Pipelines

  • dopandastars432 - Hints and tips for using pandas in an analysis environment. pandas compatible
  • Oct 29th, 2019

    Optimization

  • scikit-optstars3.5k - Heuristic Algorithms for optimization.
  • Oct 28th, 2019

    Data Manipulation

    Data Containers

  • pandas-logstars201 - A package which allow to provide feedback about basic pandas operations and find both buisness logic and performance issues.
  • Oct 26th, 2019

    Visualization

    Interactive plots

  • plotly - A Python library that makes interactive and publication-quality graphs.
  • Oct 6th, 2019

    Machine Learning

    General Purpouse Machine Learning

  • hyperlearnstars1.4k - 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels. sklearn PyTorch based/compatible
  • Natural Language Processing

  • spaCy - Industrial-Strength Natural Language Processing.
  • Oct 5th, 2019

    Data Manipulation

    Data Containers

  • pandas_flavorstars239 - A package which allow to write your own flavor of Pandas easily.
  • Sep 24th, 2019

    Deep Learning

    PyTorch

  • PyTorchNetstars1.4k - An abstraction to train neural networks. PyTorch based/compatible
  • Deep Learning

    TensorFlow

  • tensorpackstars6.2k - A Neural Net Training Interface on TensorFlow. sklearn
  • Probabilistic Methods

  • MXFusionstars100 - Modular Probabilistic Programming on MXNet. MXNet based
  • Sep 23rd, 2019

    Reinforcement Learning

  • Dopaminestars9.9k - A research framework for fast prototyping of reinforcement learning algorithms.
  • Statistics

  • weightedcalcsstars96 - A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
  • Distributed Computing

  • PaddlePaddlestars18.9k - PArallel Distributed Deep LEarning.
  • Evaluation

  • sklearn-evaluationstars336 - Model evaluation made easy: plots, tables and markdown reports. sklearn
  • Sep 15th, 2019

    Statistics

  • statsmodelsstars7.8k - Statistical modeling and econometrics in Python.
  • Sep 7th, 2019

    Reinforcement Learning

  • OpenAI Baselinesstars12.9k - High-quality implementations of reinforcement learning algorithms.
  • Sep 5th, 2019

    Quantum Computing

  • PennyLanestars1.5k - Quantum machine learning, automatic differentiation, and optimization of hybrid quantum-classical computations.
  • Sep 4th, 2019

    Reinforcement Learning

  • keras-rlstars5.3k - Deep Reinforcement Learning for Keras. Keras compatible
  • Coachstars2.2k - Easy experimentation with state of the art Reinforcement Learning algorithms.
  • garagestars1.5k - A toolkit for reproducible reinforcement learning research.
  • Stable Baselinesstars3.6k - A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.
  • RLlib - Scalable Reinforcement Learning.
  • Horizonstars3.3k - A platform for Applied Reinforcement Learning.
  • TensorForcestars3.2k - A TensorFlow library for applied reinforcement learning. sklearn
  • TRFLstars3.1k - TensorFlow Reinforcement Learning. sklearn
  • ChainerRLstars1.1k - A deep reinforcement learning library built on top of Chainer.
  • Deep Learning

    TensorFlow

  • keras-contribstars1.6k - Keras community contributions. Keras compatible
  • Hyperasstars2.1k - Keras + Hyperopt: A very simple wrapper for convenient hyperparameter. Keras compatible
  • Elephasstars1.5k - Distributed Deep learning with Keras & Spark. Keras compatible
  • Herastars496 - Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser. Keras compatible
  • Spektralstars2.2k - Deep learning on graphs. Keras compatible
  • qkerasstars415 - A quantization deep learning library. Keras compatible
  • Sep 3rd, 2019

    Distributed Computing

  • Horovodstars12.7k - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. sklearn
  • PySpark - Exposes the Spark programming model to Python. Apache Spark based
  • Velesstars897 - Distributed machine learning platform.
  • Jubatusstars700 - Framework and Library for Distributed Online Machine Learning.
  • DMTKstars2.8k - Microsoft Distributed Machine Learning Toolkit.
  • dask-mlstars822 - Distributed and parallel machine learning. sklearn
  • Distributedstars1.4k - Distributed computation in Python.
  • Sep 2nd, 2019

    Visualization

    General Purposes

  • chartifystars3.2k - Python library that makes it easy for data scientists to create charts.
  • phyststars119 - Improved histograms.
  • Visualization

    Interactive plots

  • animatplotstars394 - A python package for animating plots build on matplotlib.
  • Aug 31st, 2019

    Deep Learning

    MXNet

  • MXNetstars20.1k - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler. MXNet based
  • Gluonstars2.3k - A clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet). MXNet based
  • MXboxstars31 - Simple, efficient and flexible vision toolbox for mxnet framework. MXNet based
  • gluon-cvstars5.3k - Provides implementations of the state-of-the-art deep learning models in computer vision. MXNet based
  • gluon-nlpstars2.4k - NLP made easy. MXNet based
  • Xferstars246 - Transfer Learning library for Deep Neural Networks. MXNet based
  • MXNetstars30 - HIP Port of MXNet. MXNet based Possible to run on AMD GPU
  • Model Explanation

  • mxboardstars328 - Logging MXNet data for visualization in TensorBoard. MXNet based
  • pyBreakDownstars41 - Python implementation of R package breakDown. sklearnR inspired/ported lib
  • Contrastive Explanationstars41 - Contrastive Explanation (Foil Trees). sklearn
  • yellowbrickstars3.7k - Visual analysis and diagnostic tools to facilitate machine learning model selection. sklearn
  • scikit-plotstars2.2k - An intuitive library to add plotting functionality to scikit-learn objects. sklearn
  • shapstars17.5k - A unified approach to explain the output of any machine learning model. sklearn
  • Limestars10.1k - Explaining the predictions of any machine learning classifier. sklearn
  • FairMLstars334 - FairML is a python toolbox auditing the machine learning models for bias. sklearn
  • model-analysisstars1.2k - Model analysis tools for TensorFlow. sklearn
  • themis-mlstars100 - A library that implements fairness-aware machine learning algorithms. sklearn
  • treeinterpreterstars717 - Interpreting scikit-learn's decision tree and random forest predictions. sklearn
  • Machine Learning

    General Purpouse Machine Learning

  • cuMLstars2.9k - RAPIDS Machine Learning Library. sklearn GPU accelerated
  • Sparkit-learnstars1.1k - PySpark + scikit-learn = Sparkit-learn. sklearn Apache Spark based
  • scikit-learn - Machine learning in Python. sklearn
  • modALstars1.8k - Modular active learning framework for Python3. sklearn
  • MLxtendstars4.1k - Extension and helper modules for Python's data analysis and machine learning libraries. sklearn
  • Reproducible Experiment Platform (REP)stars658 - Machine Learning toolbox for Humans. sklearn
  • scikit-multilearnstars783 - Multi-label classification for python. sklearn
  • seqlearnstars645 - Sequence classification toolkit for Python. sklearn
  • pystructstars664 - Simple structured learning framework for Python. sklearn
  • RuleFitstars319 - Implementation of the rulefit. sklearn
  • metric-learnstars1.3k - Metric learning algorithms in Python. sklearn
  • Machine Learning

    Extreme Learning Machine

  • hpelmstars170 - High performance implementation of Extreme Learning Machines (fast randomized neural networks). GPU accelerated
  • Python-ELMstars501 - Extreme Learning Machine implementation in Python. sklearn
  • Python Extreme Learning Machine (ELM)stars79 - A machine learning technique used for classification/regression tasks.
  • Machine Learning

    Kernel Methods

  • ThunderSVMstars1.4k - A fast SVM Library on GPUs and CPUs. sklearn GPU accelerated
  • pyFMstars886 - Factorization machines in python. sklearn
  • fastFMstars997 - A library for Factorization Machines. sklearn
  • tffmstars783 - TensorFlow implementation of an arbitrary order Factorization Machine. sklearn sklearn
  • scikit-rvmstars202 - Relevance Vector Machine implementation using the scikit-learn API. sklearn
  • Machine Learning

    Gradient Boosting

  • XGBooststars23.2k - Scalable, Portable and Distributed Gradient Boosting. sklearn GPU accelerated
  • LightGBMstars14.2k - A fast, distributed, high performance gradient boosting. sklearn GPU accelerated
  • CatBooststars6.7k - An open-source gradient boosting on decision trees library. sklearn GPU accelerated
  • ThunderGBMstars639 - Fast GBDTs and Random Forests on GPUs. sklearn GPU accelerated
  • Data Manipulation

    Data Containers

  • cuDFstars5k - GPU DataFrame Library. pandas compatible GPU accelerated
  • datatablestars1.6k - Data.table for Python. R inspired/ported lib
  • pysparklingstars251 - A pure Python implementation of Apache Spark's RDD and DStream interfaces. Apache Spark based
  • blazestars3.1k - NumPy and pandas interface to Big Data. pandas compatible
  • pandasqlstars1.2k - Allows you to query pandas DataFrames using SQL syntax. pandas compatible
  • pandas-gbqstars319 - pandas Google Big Query. pandas compatible
  • koalasstars3.2k - pandas API on Apache Spark. pandas compatible
  • modinstars7.8k - Speed up your pandas workflows by changing a single line of code. pandas compatible
  • Data Manipulation

    Pipelines

  • Dplythonstars751 - Dplyr for Python. R inspired/ported lib
  • pandas-plystars185 - Functional data manipulation for pandas. pandas compatible
  • sklearn-pandasstars2.6k - pandas integration with sklearn. sklearn pandas compatible
  • pyjanitorstars975 - Clean APIs for data cleaning. pandas compatible
  • Deep Learning

    TensorFlow

  • NeuPystars715 - NeuPy is a Python library for Artificial Neural Networks and Deep Learning (previously: Theano compatible). sklearn
  • tensorflow-upstreamstars608 - TensorFlow ROCm port. sklearn Possible to run on AMD GPU
  • TensorFlowstars167.9k - Computation using data flow graphs for scalable machine learning by Google. sklearn
  • TensorLayerstars7.1k - Deep Learning and Reinforcement Learning Library for Researcher and Engineer. sklearn
  • TFLearnstars9.6k - Deep learning library featuring a higher-level API for TensorFlow. sklearn
  • Sonnetstars9.4k - TensorFlow-based neural network library. sklearn
  • Polyaxonstars3.2k - A platform that helps you build, manage and monitor deep learning models. sklearn
  • tfdeploystars349 - Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy. sklearn
  • TensorFlow Foldstars1.8k - Deep learning with dynamic computation graphs in TensorFlow. sklearn
  • tensorlmstars63 - Wrapper library for text generation / language models at char and word level with RNN. sklearn
  • TensorLightstars10 - A high-level framework for TensorFlow. sklearn
  • Mesh TensorFlowstars1.3k - Model Parallelism Made Easier. sklearn
  • Ludwigstars8.5k - A toolbox, that allows to train and test deep learning models without the need to write code. sklearn
  • Deep Learning

    PyTorch

  • PyTorchstars59k - Tensors and Dynamic neural networks in Python with strong GPU acceleration. PyTorch based/compatible
  • torchvisionstars12.4k - Datasets, Transforms and Models specific to Computer Vision. PyTorch based/compatible
  • torchtextstars3.1k - Data loaders and abstractions for text and NLP. PyTorch based/compatible
  • torchaudiostars1.8k - An audio library for PyTorch. PyTorch based/compatible
  • ignitestars4k - High-level library to help with training neural networks in PyTorch. PyTorch based/compatible
  • skorchstars4.7k - A scikit-learn compatible neural network library that wraps pytorch. sklearn PyTorch based/compatible
  • pytorch_geometricstars15.6k - Geometric Deep Learning Extension Library for PyTorch. PyTorch based/compatible
  • Probabilistic Methods

  • pyrostars7.6k - A flexible, scalable deep probabilistic programming library built on PyTorch. PyTorch based/compatible
  • PtStatstars108 - Probabilistic Programming and Statistical Inference in PyTorch. PyTorch based/compatible
  • PyVarInfstars340 - Bayesian Deep Learning methods with Variational Inference for PyTorch. PyTorch based/compatible
  • GPyTorchstars2.9k - A highly efficient and modular implementation of Gaussian Processes in PyTorch. PyTorch based/compatible
  • ZhuSuan - Bayesian Deep Learning. sklearn
  • InferPystars139 - Deep Probabilistic Modelling Made Easy. sklearn
  • GPflow - Gaussian processes in TensorFlow. sklearn
  • sklearn-bayesstars474 - Python package for Bayesian Machine Learning with scikit-learn API. sklearn
  • skprostars113 - Supervised domain-agnostic prediction framework for probabilistic modelling by The Alan Turing Institute. sklearn
  • sklearn-crfsuitestars409 - A scikit-learn inspired API for CRFsuite. sklearn
  • Optimization

  • BoTorchstars2.4k - Bayesian optimization in PyTorch. PyTorch based/compatible
  • hyperopt-sklearnstars1.4k - Hyper-parameter optimization for sklearn. sklearn
  • sklearn-deapstars704 - Use evolutionary algorithms instead of gridsearch in scikit-learn. sklearn
  • sigopt_sklearnstars72 - SigOpt wrappers for scikit-learn methods. sklearn
  • GPflowOptstars252 - Bayesian Optimization using GPflow. sklearn
  • Machine Learning

    Automated Machine Learning

  • TPOTstars8.7k - Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming. sklearn
  • auto-sklearnstars6.5k - An automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. sklearn
  • Machine Learning

    Ensemble Methods

  • ML-Ensemble - High performance ensemble learning. sklearn
  • Stackingstars187 - Simple and useful stacking library, written in Python. sklearn
  • stacked_generalizationstars114 - Library for machine learning stacking generalization. sklearn
  • vecstackstars669 - Python package for stacking (machine learning technique). sklearn
  • Machine Learning

    Imbalanced Datasets

  • imbalanced-learnstars6.1k - Module to perform under sampling and over sampling with various techniques. sklearn
  • imbalanced-algorithmsstars217 - Python-based implementations of algorithms for learning on imbalanced data. sklearn sklearn
  • Machine Learning

    Random Forests

  • rpforeststars212 - A forest of random projection trees. sklearn
  • sklearn-random-bits-foreststars8 - Wrapper of the Random Bits Forest program written by (Wang et al., 2016).sklearn
  • rgf_pythonstars363 - Python Wrapper of Regularized Greedy Forest. sklearn
  • Feature Engineering

    General

  • skl-groupsstars41 - A scikit-learn addon to operate on set/"group"-based features. sklearn
  • Feature Forgestars382 - A set of tools for creating and testing machine learning feature. sklearn
  • fewstars46 - A feature engineering wrapper for sklearn. sklearn
  • scikit-mdrstars122 - A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction. sklearn
  • tsfreshstars6.6k - Automatic extraction of relevant features from time series. sklearn
  • Feature Engineering

    Feature Selection

  • boruta_pystars1.2k - Implementations of the Boruta all-relevant feature selection method. sklearn
  • BoostARootastars182 - A fast xgboost feature selection algorithm. sklearn
  • scikit-rebatestars368 - A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning. sklearn
  • scikit-featurestars1.3k - Feature selection repository in python.
  • Genetic Programming

  • gplearnstars1.2k - Genetic Programming in Python. sklearn
  • karoo_gpstars146 - A Genetic Programming platform for Python with GPU support. sklearn
  • sklearn-geneticstars251 - Genetic feature selection module for scikit-learn. sklearn
  • Natural Language Processing

  • skiftstars233 - Scikit-learn wrappers for Python fastText. sklearn
  • pyMorfologikstars18 - Python binding for Morfologikstars166.
  • PSI-Toolkit - A natural language processing toolkit.
  • flairstars12k - Very simple framework for state-of-the-art NLP.
  • Statistics

  • pandas_summarystars437 - Extension to pandas dataframes describe function. pandas compatible
  • Pandas Profilingstars9.6k - Create HTML profiling reports from pandas DataFrame objects. pandas compatible
  • Alphalensstars2.4k - Performance analysis of predictive (alpha) stock factors.
  • Experimentation

  • Axstars1.9k - Adaptive Experimentation Platform. sklearn
  • Sacredstars3.9k - A tool to help you configure, organize, log and reproduce experiments.
  • Computations

  • Daskstars10.3k - Parallel computing with task scheduling. pandas compatible
  • Spatial Analysis

  • GeoPandasstars3.3k - Python tools for geographic data. pandas compatible
  • Aug 30th, 2019

    Data Manipulation

    Data Containers

  • swifterstars2.1k - A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner.
  • Data Manipulation

    Pipelines

  • mezastars398 - A Python toolkit for processing tabular data.
  • Model Explanation

  • Auralisationstars40 - Auralisation of learned features in CNN (for audio).
  • CapsNet-Visualizationstars385 - A visualization of the CapsNet layers to better understand how it works.
  • lucid