Track Awesome Data Engineering Updates Daily
A curated list of data engineering tools for software developers
🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor · 😺 igorbarinov/awesome-data-engineering · ⭐ 4.6K · 🏷️ Big Data
Feb 13, 2021
Conferences
- Data Council Data Council is the first technical conference that bridges the gap between data scientists, data engineers and data analysts.
Jan 28, 2019
Realtime
- Twitter Realtime The Streaming APIs give developers low latency access to Twitter’s global stream of Tweet data.
Data Dumps
- GitHub Archive GitHub's public timeline since 2011, updated every hour
Aug 23, 2017
Podcasts
- Data Engineering Podcast The show about modern data infrastructure.
Apr 12, 2017
Forums
- /r/dataengineering News, tips and background on Data Engineering
- /r/etl Subreddit focused on ETL
Mar 24, 2017
Realtime
- Reddit Real-time data is available including comments, submissions and links posted to reddit
Data Dumps
- Common Crawl Open source repository of web crawl data
- Wikipedia Wikipedia's complete copy of all wikis, in the form of wikitext source and metadata embedded in XML. A number of raw database tables in SQL form are also available.
Sep 03, 2015
Realtime
- Eventsim (⭐422) Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.
Jul 17, 2015
Prometheus
- HAProxy Exporter (⭐578) Simple server that scrapes HAProxy stats and exports them via HTTP for Prometheus consumption
Jul 16, 2015
Prometheus
- Prometheus.io (⭐45k) An open-source service monitoring system and time series database