Track Awesome Data Engineering Updates Daily
A curated list of data engineering tools for software developers
🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor · 😺 igorbarinov/awesome-data-engineering · ⭐ 5.4K · 🏷️ Big Data
Nov 22, 2023
Prometheus
- Grai (⭐241) is a data catalog tool that integrates into your CI system exposing downstream impact testing of data changes. These tests prevent data changes which might break data pipelines or BI dashboards from making it to production.
Feb 13, 2021
Conferences
- Data Council Data Council is the first technical conference that bridges the gap between data scientists, data engineers and data analysts.
Jan 28, 2019
Realtime
- Twitter Realtime The Streaming APIs give developers low latency access to Twitter’s global stream of Tweet data.
Data Dumps
- GitHub Archive GitHub's public timeline since 2011, updated every hour
Aug 23, 2017
Podcasts
- Data Engineering Podcast The show about modern data infrastructure.
Apr 12, 2017
Forums
- /r/dataengineering News, tips and background on Data Engineering
- /r/etl Subreddit focused on ETL
Mar 24, 2017
Realtime
- Reddit Real-time data is available including comments, submissions and links posted to reddit
Data Dumps
- Common Crawl Open source repository of web crawl data
- Wikipedia Wikipedia's complete copy of all wikis, in the form of wikitext source and metadata embedded in XML. A number of raw database tables in SQL form are also available.
Sep 03, 2015
Realtime
- Eventsim (⭐471) Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.
Jul 17, 2015
Prometheus
- HAProxy Exporter (⭐603) Simple server that scrapes HAProxy stats and exports them via HTTP for Prometheus consumption
Jul 16, 2015
Prometheus
- Prometheus.io (⭐51k) An open-source service monitoring system and time series database