Top 50 Awesome List

awesome-spark/awesome-spark

Big Data  25 days ago  1.3k
A curated list of awesome Apache Spark packages and resources.
View byDAY/WEEK/README
View on Github

Awesome Spark Awesome

A curated list of awesome Apache Spark packages and resources.

Apache Spark is an open-source cluster-computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance (Wikipedia 2017).

Users of Apache Spark may choose between different the Python, R, Scala and Java programming languages to interface with the Apache Spark APIs.

Contents

Packages

Language Bindings

Notebooks and IDEs

  • almond - A scala kernel for Jupyter.
  • Apache Zeppelin - Web-based notebook that enables interactive data analytics with plugable backends, integrated plotting, and extensive Spark support out-of-the-box.
  • Polynote - Polynote: an IDE-inspired polyglot notebook. It supports mixing multiple languages in one notebook, and sharing data between them seamlessly. It encourages reproducible notebooks with its immutable data model. Originating from Netflix.
  • Spark Notebookstars3.1k - Scalable and stable Scala and Spark focused notebook bridging the gap between JVM and Data Scientists (incl. extendable, typesafe and reactive charts).
  • sparkmagicstars1.1k - Jupyter magics and kernels for working with remote Spark clusters, for interactively working with remote Spark clusters through Livystars973, in Jupyter notebooks.

General Purpose Libraries

SQL Data Sources

SparkSQL has serveral built-in Data Sources for files. These include csv, json, parquet, orc, and avro. It also supports JDBC databases as well as Apache Hive. Additional data sources can be added by including the packages listed below, or writing your own.

Storage

  • Delta Lakestars4k - Storage layer with ACID transactions.
  • lakeFS - Integration with the lakeFS atomic versioned storage layer.

Bioinformatics

GIS

Time Series Analytics

Graph Processing

  • Mazerunnerstars378 - Graph analytics platform on top of Neo4j and GraphX.
  • GraphFramesstars832 - Data frame based graph API.
  • neo4j-spark-connectorstars276 - Bolt protocol based, Neo4j Connector with RDD, DataFrame and GraphX / GraphFrames support.
  • SparklingGraph - Library extending GraphX features with multiple functionalities useful in graph analytics (measures, generators, link prediction etc.).

Machine Learning Extension

Middleware

  • Livystars672 - REST server with extensive language support (Python, R, Scala), ability to maintain interactive sessions and object sharing.
  • spark-jobserverstars2.8k - Simple Spark as a Service which supports objects sharing using so called named objects. JVM only.
  • Miststars317 - Service for exposing Spark analytical jobs and machine learning models as realtime, batch or reactive web services.
  • Apache Toreestars687 - IPython protocol based middleware for interactive applications.
  • Apache Kyuubistars903 - A distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark.

Monitoring

Utilities

Natural Language Processing

Streaming

  • Apache Bahir - Collection of the streaming connectors excluded from Spark 2.0 (Akka, MQTT, Twitter. ZeroMQ).

Interfaces

  • Apache Beam - Unified data processing engine supporting both batch and streaming applications. Apache Spark is one of the supported execution environments.
  • Blazestars3k - Interface for querying larger than memory datasets using Pandas-like syntax. It supports both Spark DataFrames and RDDs.
  • Koalasstars3.1k - Pandas DataFrame API on top of Apache Spark.

Testing

Web Archives

Workflow Management

Resources

Books

Papers

MOOCS

Workshops

Projects Using Spark

  • Oryx 2stars1.8k - Lambda architecture platform built on Apache Spark and Apache Kafka with specialization for real-time large scale machine learning.
  • Photon MLstars783 - A machine learning library supporting classical Generalized Mixed Model and Generalized Additive Mixed Effect Model.
  • PredictionIO - Machine Learning server for developers and data scientists to build and deploy predictive applications in a fraction of the time.
  • Crossdatastars171 - Data integration platform with extended DataSource API and multi-user environment.

Docker Images

Miscellaneous

References

Wikipedia. 2017. “Apache Spark — Wikipedia, the Free Encyclopedia.” https://en.wikipedia.org/w/index.php?title=Apache_Spark&oldid=781182753.

License

Public Domain Mark
This work (Awesome Spark, by https://github.com/awesome-spark/awesome-spark), identified by Maciej Szymkiewicz, is free of known copyright restrictions.

Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. This compilation is not endorsed by The Apache Software Foundation.

Inspired by sindresorhus/awesomestars186.2k.

ON THIS PAGE

  1. Awesome Spark Awesome
  2. Contents
  3. Packages
  4. Language Bindings
  5. Notebooks and IDEs
  6. General Purpose Libraries
  7. SQL Data Sources
  8. Storage
  9. Bioinformatics
  10. GIS
  11. Time Series Analytics
  12. Graph Processing
  13. Machine Learning Extension
  14. Middleware
  15. Monitoring
  16. Utilities
  17. Natural Language Processing
  18. Streaming
  19. Interfaces
  20. Testing
  21. Web Archives
  22. Workflow Management
  23. Resources
  24. Books
  25. Papers
  26. MOOCS
  27. Workshops
  28. Projects Using Spark
  29. Docker Images
  30. Miscellaneous
  31. References
  32. License
Last Checked At: 2022-01-24T04:20:48.111Z
Previous
manuzhang/awesome-streaming
Next
ambster-public/awesome-qlik

About

Track your favorite github awesome repo, not just star it. trackawesomelist.com provides website, newsletter, RSS for tracking the popular awesome list by daily and weekly.
Contact us: [email protected]
Track Awesome List - Track your favorite Github awesome repos, not just star them | Product Hunt

Subscribe

Subscribe to our weekly newsletter to receive the awesome updates! We never send spam and you can unsubscribe instantly with one click. Here's past issues.

Links

Follow us on TwitterSubscribe us on TelegramSubmit awesome list repoNewsletterDonateSitemap