# Awesome Learn Datascience Overview

:chart_with_upwards_trend: Curated list of resources to help you get started with Data Science

🏠 Home · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor · 😺 siboehm/awesome-learn-datascience · ⭐ 563 · 🏷️ Computer Science

# Data Science Tutorials & Resources for Beginners

*If you want to know more about Data Science but don't know where to start this list is for you!* :chart_with_upwards_trend:

No previous knowledge required but Python and statistics basics will definitely come in handy. These ressources have been used successfully for many beginners at my local Data Science student group ML-KA.

## What is Data Science?

- 'What is Data Science?' on Quora
- Explanation of important vocabulary - Differentiation of Big Data, Machine Learning, Data Science.
- Data Science for Business (Book) - An introduction to Data Science and its use as a business asset.

## Common Algorithms and Procedures

- Supervised vs unsupervised learning - The two most common types of Machine Learning algorithms.
- 9 important Data Science algorithms and their implementation
- Cross validation - Evaluate the performance of your algorithm / model.
- Feature engineering - Modifying the data to better model predictions.
- Scientific introduction to 10 important Data Science algorithms
- Model ensemble: Explanation - Combine multiple models into one for better performance.

## Data Science using Python

This list covers only Python, as many are already familiar with this language. Data Science tutorials using R (⭐1.8k).

### General

- O'Reilly Data Science from Scratch (Book) - Data processing, implementation, and visualization with example code.
- Coursera Applied Data Science - Online Course using Python that covers most of the relevant toolkits.

### Learning Python

### numpy

numpy is a Python library which provides large multidimensional arrays and fast mathematical operations on them.

### pandas

pandas provides efficient data structures and analysis tools for Python. It is build on top of numpy.

- Introduction to pandas
- DataCamp pandas foundations - Paid course, but 30 free days upon account creation (enough to complete course).
- Pandas cheatsheet (⭐36k) - Quick overview over the most important functions.

### scikit-learn

scikit-learn is the most common library for Machine Learning and Data Science in Python.

- Introduction and first model application
- Rough guide for choosing estimators
- Scikit-learn complete user guide
- Model ensemble: Implementation in Python

### Jupyter Notebook

Jupyter Notebook is a web application for easy data visualisation and code presentation.

- Downloading and running first Jupyter notebook
- Example notebook for data exploration
- Seaborn data visualization tutorial - Plot library that works great with Jupyter.

### Various other helpful tools and resources

- Template folder structure for organizing Data Science projects (⭐6.1k)
- Anaconda Python distribution - Contains most of the important Python packages for Data Science.
- Spacy - Open source toolkit for working with text-based data.
- LightGBM gradient boosting framework (⭐14k) - Successfully used in many Kaggle challenges.
- Amazon AWS - Rent cloud servers for more timeconsuming calculations (r4.xlarge server is a good place to start).

## Data Science Challenges for Beginners

Sorted by increasing complexity.

- Walkthrough: House prices challenge - Walkthrough through a simple challenge on house prices.
- Blood Donation Challenge - Predict if a donor will donate again.
- Titanic Challenge - Predict survival on the Titanic.
- Water Pump Challenge - Predict the operating condition of water pumps in Africa.

## More advanced resources and lists

## Contribute

Contributions welcome! Read the contribution guidelines first.

## License

To the extent possible under law, Simon Böhm has waived all copyright and related or neighboring rights to this work. Disclaimer: Some of the links are affiliate links.