Top 50 Awesome List

manuzhang/awesome-streaming

Big Data  19 days ago  1.8k
a curated list of awesome streaming frameworks, applications, etc
View byDAY/WEEK/README
View on Github

Oct 6th

Website

Streaming Library

  • YoMostars589 [Go] - An open source Streaming Serverless Framework for building Low-latency Geo-distributed system. YoMo Built atop QUIC Transport Protocol and Functional Reactive Programming interface.
  • Oct 4th

    Website

    Data Pipeline

  • fluviostars717 [Rust/WASM] - Real-time programmable data streaming platform with in-line computation capabilities.
  • Oct 2nd

    Website

    Readings

  • Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing by Reuven Lax, Slava Chernyak, and Tyler Akidau
  • Sep 22nd

    Website

    Streaming Engine

  • Scramjet Transform Hubstars27 [JavaScript/Node.js] - data processing engine for running multiple data processing apps (sequences) written in JavaScript or TypeScript
  • Website

    Streaming Library

  • Scramjet Frameworkstars192 - functional reactive stream programming framework written on top of Node.js object streams.
  • May 7th

    Website

    Streaming Engine

  • Apache Ballista [Rust] - distributed compute platform powered by Apache Arrow.
  • Website

    Data Pipeline

  • StreamSets Data Collectorstars27 [Java] - continuous big data ingestion infrastructure that reads from and writes to a large number of end-points, including S3, JDBC, Hadoop, Kafka, Cassandra and many others.
  • Website

    Streaming SQL

  • Siddhistars1.3k [Java] - A cloud native Streaming and Complex Event Processing engine that understands Streaming SQL queries in order to capture events from diverse data sources, process them, detect complex conditions, and publish output to various endpoints in real time.
  • Apr 1st

    Website

    Streaming SQL

  • Materialize [Rust] - A source-available streaming SQL engine for maintaining materialized views on data from message brokers and databases.
  • ksqlDBstars4.6k [Java] - A cloud-native, source-available database purpose-built for stream processing applications
  • Feb 21st

    Website

    Streaming Engine

  • Maki Nagestars12 [Python] - A stream processing framework for data scientists, based on Kafka and ReactiveX.
  • Aug 12th, 2020

    Website

    Streaming Library

  • Tributarystars273 [Python] - A python library for constructing dataflow graphs. Supports synchronous, reactive data streams built using python generators that mimic complex event processors, as well as lazily-evaluated acyclic graphs and functional currying streams.
  • Jun 12th, 2020

    Website

    Readings

  • Grokking Streaming Systems by Josh Fischer & Ning Wang
  • May 7th, 2020

    Website

    Data Pipeline

  • RudderStackstars2.8k [Go] - an open source customer data infrastructure (segment, mparticle alternative).
  • Apr 30th, 2020

    Website

    Data Pipeline

  • Gazettestars285 [golang] - Distributed streaming infrastructure built on cloud storage which makes it easy to mix and match batch and streaming paradigms.
  • Dec 30th, 2019

    Website

    IoT

  • Apache StreamPipesstars266 [Java] - a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.
  • Dec 10th, 2019

    Website

    Streaming Engine

  • Apache Heron (incubating)stars3.6k [Java] - a realtime, distributed, fault-tolerant stream processing engine from Twitter.
  • Oct 26th, 2019

    Website

    Streaming Engine

  • mantisstars1.2k [Java] - Netflix's platform to build an ecosystem of realtime stream processing applications
  • Oct 25th, 2019

    Website

    Data Pipeline

  • LogDevice [C++] - a high-performant distributed system by Facebook for streaming and storing sequential data, using a log structure.
  • Oct 10th, 2019

    Website

    DSL

  • Apache Beamstars5k [Java, Python, SQL, Scala, Go] - unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs), open sourced by Google.
  • Website

    Closed Source

  • Cloud Dataflow[Java, Python, SQL, Scala] - Google's managed stream and batch data processing engine. Supports running Beam pipelines.
  • Sep 8th, 2019

    Website

    Streaming Library

  • Stream Opsstars41 [Java] - A fully embeddable data streaming engine and stream processing API for Java.
  • Aug 27th, 2019

    Website

    Streaming Engine

  • Gearpumpstars754 [Scala] - lightweight real-time distributed streaming engine built on Akka.
  • Aug 11th, 2019

    Website

    Streaming Library

  • Streamzstars994 [Python] - A lightweight library for building pipelines to manage continuous streams of data; supports complex pipelines that involve branching, joining, flow control, feedback, back pressure, and so on.
  • Jul 23rd, 2019

    Website

    Online Machine Learning

  • streamDMstars464 [Scala] - mining Big Data streams using Spark Streaming from Huawei.
  • StormCVstars165 [Java] - enables the use of Apache Storm for video processing by adding computer vision (CV) specific operations and data model.
  • trident-mlstars386 [Java] - realtime online machine learning library based on Trident.
  • yuritastars105 [Scala] - Anomaly detection framework built on Spark Structured Streaming from Paypal.
  • Jul 22nd, 2019

    Website

    Data Pipeline

  • brooklinstars723 [Java] - a distributed system intended for streaming data between various heterogeneous source and destination systems with high reliability and throughput at scale from Linkedin (replaced databus).
  • Jul 14th, 2019

    Website

    Streaming SQL

  • StreamCQLstars0 [Java] - Continuous Query Language on RealTime Computation System.
  • Website

    Closed Source

  • Amazon Kinesis Streams [Java] - real-time, fully managed and scalable data stream engine provided by AWS.
  • Azure Stream Analytics [.NET] a massively scalable, fully managed, real-time, data stream engine provided by Microsoft Azure.
  • concord [C++] - a distributed stream processing framework built in C++ on top of Apache.
  • IBM Streams [Python/Java/Scala] - platform for distributed processing and real-time analytics. Provides toolkits for advanced analytics like geospatial, time series, etc. out of the box.
  • jubatus [C++] - distributed processing framework and streaming machine learning library.
  • millwheel - framework for building low-latency data-processing applications that is widely used at Google.
  • Apr 10th, 2019

    Website

    Streaming Engine

  • Apache Apexstars346 [Java] - unified platform for big data stream and batch processing.
  • Apache Flinkstars17.4k [Java] - system for high-throughput, low-latency data stream processing that supports stateful computation, data-driven windowing semantics and iterative stream processing.
  • Apache Samzastars699 [Scala/Java] - distributed stream processing framework that build on Kafka(messaging, storage) and YARN(fault tolerance, processor isolation, security and resource management).
  • Apache Spark Streamingstars31.1k [Scala] - makes it easy to build scalable fault-tolerant streaming applications.
  • Apache Stormstars6.3k [Clojure/Java] - distributed real-time computation system. Storm is to stream processing what Hadoop is to batch processing.
  • AthenaXstars1.2k [Java] - Uber's Stream Analytics Framework used in production
  • Fauststars5.8k [Python] - stream processing library, porting the ideas from Kafka Streams to Python
  • Hazelcast Jetstars937 [Java] - A general purpose distributed data processing engine, built on top of Hazelcast.
  • hailstormstars87 [Haskell] - distributed stream processing with exactly-once semantics based on Storm.
  • mupd8(muppet)stars126 [Scala/Java] - mapReduce-style framework for processing fast/streaming data.
  • Onyxstars2k [Clojure] - Distributed, masterless, high performance, fault tolerant data processing.
  • s4stars39 [Java] - general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data.
  • SABERstars38 [Java/C] - Window-Based Hybrid CPU/GPU Stream Processing Engine.
  • SPQRstars27 [Java] - dynamic framework for processing high volumn data streams through pipelines.
  • tigonstars277 [C++/Java] - high throughput real-time streaming processing framework built on Hadoop and HBase.
  • Teknekstars7 [Java] - Simple elegant stream processing with interactive prototying shell SOL (Stream Operator Language)
  • Website

    Streaming Library

  • Apache Kafka Streamsstars20.2k [Java] - lightweight stream processing library included in Apache Kafka (since 0.10 version).
  • Akka Streamsstars11.8k [Scala] - stream processing library on Akka Actors.
  • Benthosstars3.5k [Go] - Benthos is a high performance and resilient message streaming service, able to connect various sources and sinks and perform arbitrary actions, transformations and filters on payloads
  • FS2(prev. 'Scalaz-Stream')stars2k [Scala] - Compositional, streaming I/O library for Scala.
  • monixstars1.8k [Scala] - high-performance Scala / Scala.js library for composing asynchronous and event-based programs.
  • Streamlinestars155 [Java] - Stream Analytics Framework by Hortonworks, designed as a wrapper around existing streaming solutions like Storm. Aimed to allow users to drag-and-drop streaming components to focus on business logic.
  • StreamAlertstars2.6k [Python] - Airbnb's Real-time Data Analysis and Alerting.
  • Swavestars174 [Scala] - A lightweight Reactive Streams Infrastructure Toolkit for Scala.
  • Website

    Streaming Application

  • strawstars99 [Python/Java] - A platform for real-time streaming search.
  • storm-crawlerstars726 [Java] - Web crawler SDK based on Apache Storm.
  • Website

    IoT

  • sensorbeestars208 [Go] - lightweight stream processing engine for IoT.
  • Apache Edgentstars203 [Java] - a programming model and runtime that enables continuous streaming analytics on gateways and edge devices which can work with centralized systems to provide efficient and timely analytics across the whole IoT ecosystem: from the center to the edge, opens sourced by IBM.
  • Website

    DSL

  • coaststars59 [Scala] - a DSL that builds DAGs on top of Samza and provides exactly-once semantics.
  • Esperstars710 [Java] - component for complex event processing (CEP) and event series analysis.
  • Streamparsestars1.4k [Python] - lets you run Python code against real-time streams of data via Apache Storm.
  • summingbirdstars2.1k [Scala] - library that lets you write MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce platforms, including Storm and Scalding.
  • Website

    Data Pipeline

  • Apache Kafkastars20.2k [Scala/Java] - distributed, partitioned, replicated commit log service, which provides the functionality of a messaging system, but with a unique design.
  • Apache Pulsarstars9.8k [Java] - distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API.
  • camusstars884 [Java] - Linkedin's Kafka -> HDFS pipeline.
  • databusstars3.3k [Java] - Linkedin's source-agnostic distributed change data capture system.
  • flumestars2.2k [Java] - distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.
  • metaqstars1.3k [Java] - Taobao's high available, high performance distributed messaging system
  • NATS streamingstars2.3k [Go] - fast disk-backed messaging solution
  • nsqstars20.4k [Go] - realtime distributed messaging platform designed to operate at scale, handling billions of messages per day.
  • surostars765 [Java] - data pipeline service for collecting, aggregating, and dispatching large volume of application events including log data.
  • Website

    Online Machine Learning

  • Apache Samoastars238 [Java] - distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms.
  • DataSketchesstars732 [Java] - sketches library from Yahoo!.
  • StreamingBanditstars72 [Python] - Provides a webserver to quickly setup and evaluate possible solutions to contextual multi-armed bandit (cMAB) problems.
  • Website

    Streaming SQL

  • pipelinedbstars2.4k [C] - An open-source relational database that runs SQL queries continuously on streams, incrementally storing results in tables.
  • squallstars266 [Java] - Squall executes SQL queries on top of Storm for doing online processing.
  • Website

    Benchmark

  • storm-benchmarkstars44 [Java] - a set of benchmarks to test Storm performance.
  • storm-perf-teststars78 [Java] - a simple storm performance/stress test.
  • streaming-benchmarksstars549 [Java] - Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink, etc.
  • flotillastars231 [Go] - Automated message queue orchestration for scaled-up benchmarking.
  • Website

    Toolkit

  • akkastars11.8k [Scala] - toolkit and runtime for building highly concurrent, distributed, and resilient message-driven application on the JVM.
  • pulsarstars1.9k [Python] - Actor based event driven concurrent framework for Python.
  • aeronstars5.7k [Java/C++] - efficient reliable unicast and multicast message transport.
  • StreamFlowstars241 [Java] - stream processing tool designed to help build and monitor processing workflows.
  • samza-luwakstars99 [Java] - uses Luwak, a stored-query engine built on Lucene, to implement full-text search on streams.
  • Turbinestars819 [Java] - tool for aggregating streams of Server-Sent Event (SSE) JSON data into a single stream.
  • Feb 13th, 2016

    Last Checked At: 2021-10-25T04:08:49.940Z
    Previous
    igorbarinov/awesome-data-engineering
    Next
    awesome-spark/awesome-spark

    About

    Track your favorite github awesome repo, not just star it. trackawesomelist.com provides website, newsletter, RSS for tracking the popular awesome list by daily and weekly.
    Contact us: [email protected]
    Track Awesome List - Track your favorite Github awesome repos, not just star them | Product Hunt

    Subscribe

    Subscribe to our weekly newsletter to receive the awesome updates! We never send spam and you can unsubscribe instantly with one click. Here's past issues.

    Links

    Follow us on TwitterSubscribe us on TelegramSubmit awesome list repoNewsletterDonateSitemap