Top 50 Awesome List

hangtwenty/dive-into-machine-learning

Learn  2 days ago  10.8k
Dive into Machine Learning with Python Jupyter notebook and scikit-learn!
View byDAY/WEEK/README
View on Github

Oct 23rd

Other courses

  • Data science courses as Jupyter Notebooks:
  • Kevin Markham's video series, Intro to Machine Learning with scikit-learn, starts with what we've already covered, then continues on at a comfortable place.
  • More Data Science materials

  • Python Data Science Handbook, as Jupyter Notebooks
  • Swami Chandrasekaran's "Becoming a Data Scientist" is a concise, printable picture of a data science curriculum (caveat: it is from 2013, but still useful)
  • Oct 19th

    Risks

    It's dangerous to go alone, take these!

  • Awesome-Artificial-Intelligence-Guidelines.stars629
  • Awesome-ML-Model-Governancestars27
  • Risks

  • Awesome Production Machine Learningstars10.2k, "a curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning." It includes a section about privacy-preserving MLstars10.2k, by the way!
  • "Rules of Machine Learning: Best Practices for [Reliable] ML Engineering," by Martin Zinkevich, regarding ML engineering practices. There's an accompanying video.
  • Finding Open-Source Libraries

  • Julia: Julia.jl, a curated list of awesomestars28.2k libraries and software in the Julia language - with a section on Machine Learning.
  • Oct 18th

    Tools you'll need

    Cloud-based

  • Deepnote allows for real-time collaboration
  • Binder is the official choice to try JupyterLab
  • Google Colab provides "free" GPUs
  • Play to learn

  • Dr. Randal Olson's Example Machine Learning notebook: "let's pretend we're working for a startup that just got funded to create a smartphone app that automatically identifies species of flowers from pictures taken on the smartphone. We've been tasked by our head of data science to create a demo machine learning model that takes four measurements from the flowers (sepal length, sepal width, petal length, and petal width) and identifies the species based on those measurements alone."
  • Oct 17th

    Deep Learning

  • Dive into Deep Learning - An interactive book about deep learning
    • "Interactive deep learning book with code, math, and discussions"
    • "Implemented with NumPy/MXNet, PyTorch, and TensorFlow"
    • "Adopted at 200 universities from 50 countries"
  • Oct 4th

    Tools you'll need

    Local installation

  • Jupyter Notebook. (Formerly known as IPython Notebook.)
  • Mar 15th

    Bayesian Statistics and Machine Learning

  • Like learning by playing? Me too. Try 19 Questionsstars15, "a machine learning game which asks you questions and guesses an object you are thinking about," and explains which Bayesian statistics techniques are being used.
  • Mar 16th, 2020

    Jargon note

  • Another handy term: Data Engineering, which may involve or support Machine Learning, but is not limited to Machine Learning.
  • Mar 18th, 2019

    Alternative ways to "Dive into Machine Learning"

  • Courses by cloud vendors (may be specific to their tools/platforms)
    • Machine Learning Crash Course from Google with TensorFlow APIs. This is Google's fast-paced, practical introduction to machine learning which features a series of lessons with video lectures, real-world case studies, and hands-on practice exercises.
    • Amazon AWS Amazon have open up their internal training to the public and also offer certification. 30 courses - 45+ hours of content.
  • Mar 3rd, 2019

    Just about time for a break...

  • Download the "Starting Simple" episode, and listen to that soon. It supports what we read from Domingos. Ryan Adams talks about starting simple, as we discussed above. Adams also stresses the importance of feature engineering. Feature engineering is an exercise of the "knowledge" Domingos writes about. In a later episode, they share many concrete tips for feature engineering.
  • Oct 19th, 2018

    Supplement: Learning Pandas well

  • Bookmarks for later when you need to scale
    • The odo library for converting between many formats.
    • dask: A Pandas-like interface, but for larger-than-memory data and "under the hood" parallelism. Very interesting, but only needed when you're getting advanced.
  • Collaborate with Domain Experts

    🙇‍♂️ A note about Machine Learning and User Experience (UX)

  • Rule #23 of Martin Zinkevich's Rules of ML Engineering: "You are not a typical end user."
  • There are some great thoughtful discussions of this on Quora
  • Oct 3rd, 2018

    Alternative ways to "Dive into Machine Learning"

  • Distill is a journal devoted to clear and interactive explanations of the lastest research in machine learning. They offer an alternative to traditional academic publishing that promotes accessibility and transparency in the field.
  • Apr 4th, 2018

    Finding Open-Source Libraries

  • TensorFlow has been a really big deal. People like you will do exciting things with TensorFlow. It's a framework. Frameworks can help you manage complexity. Just remember this rule of thumb: "More data beats a cleverer algorithm" (Domingos), no matter how cool your tools are. Also note, TensorFlow is not the only machine learning framework of its kind: Check this great, detailed comparison of TensorFlow, Torch, and Theano.stars2.1k See also Newmu/Theano-Tutorialsstars1.2k and nlintz/TensorFlow-Tutorialsstars5.9k. See also the section on Deep Learning above.
    • Also, consider Lorestars1.5k. "Lore is a python framework to make machine learning [especially deep learning] approachable for Engineers and maintainable for Data Scientists."
  • Mar 20th, 2018

    Deep Learning

  • Machine Learning Crash Course from Google. Google's fast-paced, practical introduction to machine learning which covers building deep neural networks with TensorFlow.
  • Nov 4th, 2017

    Other courses

  • UC Berkeley's Data 8: The Foundations of Data Science course and the textbook Computational and Inferential Thinking teaches critical concepts in Data Science.
  • Sep 16th, 2017

    Recommended course: Prof. Andrew Ng's Machine Learning on Coursera

    Tips for studying

  • Busy schedule? Read Ray Li's review of Prof. Andrew Ng's course for some helpful tips.
  • Review some of the "Learning How to Learn" videos. This is just about how to study in general. In the course, they advocate the learn-by-doing approach, as we're doing here. You'll get various other tips that are easy to apply, but go a long way to make your time investment more effective.
  • Deep Learning

  • "Have Fun With [Deep] Learning" by David Humphrey.stars4.9k This is an excellent way to "get ahead of yourself" and hack-first. Then you will feel excited to move onto...
  • Prof. Andrew Ng's courses on Deep Learning! There five courses, as part of the Deep Learning Specialization on Coursera. These courses are part of his new venture, deeplearning.ai
  • Aug 18th, 2017

    Other courses

  • Advanced Statistical Computing (Vanderbilt BIOS8366). Interactive (lots of IPython Notebook material)
  • Jul 18th, 2017

    Supplement: Learning Pandas well

  • Essential: Things in Pandas I Wish I'd Had Known Earlier (IPython Notebook)
  • Another helpful tutorial: Real World Data Cleanup with Python and Pandas
  • Supplement: Cheat Sheets

  • Matplotlib / Pandas / Python cheat sheets.
  • More Data Science materials

  • Data Science Workflow: Overview and Challenges (read the article and also the comment by Joseph McCarthy)
  • Alternative ways to "Dive into Machine Learning"

  • "How would your curriculum for a machine learning beginner look like?" by Sebastian Raschka. A selection of the core online courses and books for getting started with machine learning and gaining expert knowledge. It contextualizes Raschka's own book, Python Machine Learning (which I would have linked to anyway!) See also pattern_classification GitHub repository maintained by the author, which contains IPython notebooks about various machine learning algorithms and various data science related resources.
  • Materials for Learning Machine Learning by Jack Simpson
  • For some news sources to follow, check out Sam DeBrule's list here.
  • Jan 23rd, 2017

    Jan 3rd, 2017

    "Big" Data?

  • Designing Data-Intensive Applications by Martin Kleppman. (You can start reading it online, free, via Safari Books.) It's not specific to Machine Learning, but you can bridge that gap yourself.
  • Jan 2nd, 2017

    Just about time for a break...

  • Then, over time, you can listen to the entire podcast series (start from the beginning).
  • Other courses

  • Prof. Pedro Domingos's introductory video series. Domingos wrote the paper "A Few Useful Things to Know About Machine Learning", recommended earlier in this guide.
  • Oct 19th, 2016

    Supplement: Learning Pandas well

  • Video series from Data School, about Pandas. "Reference guide to 30 common pandas tasks (plus 6 hours of supporting video)."
  • Alternative ways to "Dive into Machine Learning"

  • [Your guide here]
  • Oct 11th, 2016

    Alternative ways to "Dive into Machine Learning"

  • Machine Learning for Software Engineersstars25.5k by Nam Vu. It’s the top-down and results-first approach designed for software engineers.
  • Aug 7th, 2016

    Deep Learning

  • Deep Learning, a free book published MIT Press. By Ian Goodfellow, Yoshua Bengio and Aaron Courville
  • Quora: "What are the best ways to pick up Deep Learning skills as an engineer?" — answered by Greg Brockman (Co-Founder & CTO at OpenAI, previously CTO at Stripe)
  • Jul 7th, 2016

    Alternative ways to "Dive into Machine Learning"

  • Machine Learning for Developers is another good introduction, perhaps better if you're more familiar with Java or Scala. It introduces machine learning for a developer audience using Smile, a machine learning library that can be used both in Java and Scala.
  • Feb 9th, 2016

    Other courses

  • Prof. Mark A. Girolami's Machine Learning Module (GitHub Mirror).stars428 Good for people with a strong mathematics background.
  • Finding Open-Source Libraries

  • Bookmark Pythonidae, a curated list of awesomestars28.2k libraries and software in the Python language - with a section on Machine Learning.
  • Jan 25th, 2016

    Bayesian Statistics and Machine Learning

  • The free book, Probabilistic Programming and Bayesian Methods for Hackers. Made with a "computation/understanding-first, mathematics-second point of view." It's available in print too!
  • Bayesian Modelling in Pythonstars2.3k
  • Alternative ways to "Dive into Machine Learning"

  • Example Machine Learning notebook, exercise, and guide by Dr. Randal S. Olson. Mentioned in Notebooks section as well, but it has a similar goal to this guide (introduce you, and show you where to go next). Rich "Further Reading" section.
  • Jan 13th, 2016

    Risks

    Towards Expertise

  • Communicate results. When you have a novel finding, reach out for peer review.
  • Ask a question. Start your own study. The "most important thing in data science is the question" (Dr. Jeff T. Leek). So start with a question. Then, find real datastars46.2k. Analyze it. Then ...
  • Fix issues. Learn. Share what you learn.
  • Finding Open-Source Libraries

  • Bookmark awesome-machine-learningstars51.7k, a curated list of awesomestars28.2k Machine Learning libraries and software.
  • For Machine-Learning libraries that might not be on PyPI, GitHub, etc., there's MLOSS (Machine Learning Open Source Software). Seems to feature many academic libraries.
  • Play to learn

  • If you want more of a data science bent, pick a notebook from this excellent list of Data Science ipython notebooksstars21.8k. "Continually updated Data Science Python Notebooks: Spark, Hadoop MapReduce, HDFS, AWS, Kaggle, scikit-learn, matplotlib, pandas, NumPy, SciPy, and various command lines."
  • Machine Learning from Disaster: Using Titanic data, "Demonstrates basic data munging, analysis, and visualization techniques. Shows examples of supervised machine learning techniques."
  • Or more generic tutorials/overviews ...
  • Risks

    Ask for Peer Review

  • Cross-Validated: stats.stackexchange.com
  • Hacker News: news.ycombinator.com. You'll probably want to submit as "Show HN"
  • /r/DataIsBeautiful
  • /r/DataScience
  • /r/MachineLearning
  • Tools you'll need

    Local installation

  • Python. Python 3 is the best option.
  • Some scientific computing packages:
    • numpy
    • pandas
    • scikit-learn
    • matplotlib
  • Other courses

  • Surveys of Data Science courseware (a bit more Choose Your Own Adventure)
  • Data Science (Harvard CS109)
  • More Data Science materials

  • Extremely accessible data science book: Data Smart by John Foreman
  • A Few Useful Things to Know about Machine Learning

  • Data alone is not enough. This is where science meets art in machine-learning. Quoting Domingos: "... the need for knowledge in learning should not be surprising. Machine learning is not magic; it can’t get something from nothing. What it does is get more from less. Programming, like all engineering, is a lot of work: we have to build everything from scratch. Learning is more like farming, which lets nature do most of the work. Farmers combine seeds with nutrients to grow crops. Learners combine knowledge with data to grow programs."
  • More data beats a cleverer algorithm. Listen up, programmers. We like cool tools. Resist the temptation to reinvent the wheel, or to over-engineer solutions. Your starting point is to Do the Simplest Thing that Could Possibly Work. Quoting Domingos: "Suppose you’ve constructed the best set of features you can, but the classifiers you’re getting are still not accurate enough. What can you do now? There are two main choices: design a better learning algorithm, or gather more data. [...] As a rule of thumb, a dumb algorithm with lots and lots of data beats a clever one with modest amounts of it. (After all, machine learning is all about letting data do the heavy lifting.)"
  • Supplement: Learning Pandas well

  • Essential: 10 Minutes to Pandas
  • Useful Pandas Snippets
  • Here are some docs I found especially helpful as I continued learning:
  • Supplement: Cheat Sheets

  • scikit-learn algorithm cheat sheet
  • Metacademy: a package manager for [machine learning] knowledge. A mind map of machine learning concepts, with great detail on each.
  • Risks

  • The High Cost of Maintaining Machine Learning Systems
  • Last Checked At: 2021-10-25T04:24:05.536Z
    Previous
    lucasviola/awesome-tech-videos
    Next
    watson/awesome-computer-history

    About

    Track your favorite github awesome repo, not just star it. trackawesomelist.com provides website, newsletter, RSS for tracking the popular awesome list by daily and weekly.
    Contact us: [email protected]
    Track Awesome List - Track your favorite Github awesome repos, not just star them | Product Hunt

    Subscribe

    Subscribe to our weekly newsletter to receive the awesome updates! We never send spam and you can unsubscribe instantly with one click. Here's past issues.

    Links

    Follow us on TwitterSubscribe us on TelegramSubmit awesome list repoNewsletterDonateSitemap