Track Awesome Data Engineering Updates Daily
A curated list of data engineering tools for software developers
🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor · 😺 igorbarinov/awesome-data-engineering · ⭐ 5.4K · 🏷️ Big Data
[ Daily / Weekly / Overview ]
Nov 22, 2023
- Grai (⭐241) is a data catalog tool that integrates into your CI system exposing downstream impact testing of data changes. These tests prevent data changes which might break data pipelines or BI dashboards from making it to production.
Feb 13, 2021
- Data Council Data Council is the first technical conference that bridges the gap between data scientists, data engineers and data analysts.
Jan 28, 2019
- Twitter Realtime The Streaming APIs give developers low latency access to Twitter’s global stream of Tweet data.
- GitHub Archive GitHub's public timeline since 2011, updated every hour
Aug 23, 2017
- Data Engineering Podcast The show about modern data infrastructure.
Apr 12, 2017
- /r/dataengineering News, tips and background on Data Engineering
- /r/etl Subreddit focused on ETL
Mar 24, 2017
- Reddit Real-time data is available including comments, submissions and links posted to reddit
- Common Crawl Open source repository of web crawl data
- Wikipedia Wikipedia's complete copy of all wikis, in the form of wikitext source and metadata embedded in XML. A number of raw database tables in SQL form are also available.
Sep 03, 2015
- Eventsim (⭐471) Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.
Jul 17, 2015
- HAProxy Exporter (⭐603) Simple server that scrapes HAProxy stats and exports them via HTTP for Prometheus consumption
Jul 16, 2015
- Prometheus.io (⭐51k) An open-source service monitoring system and time series database