<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Track Awesome Spark Updates Daily</title>
  <id>https://www.trackawesomelist.com/awesome-spark/awesome-spark/feed.xml</id>
  <updated>2024-10-24T12:50:13.872Z</updated>
  <link rel="self" type="application/atom+xml" href="https://www.trackawesomelist.com/awesome-spark/awesome-spark/feed.xml"/>
  <link rel="alternate" type="application/json" href="https://www.trackawesomelist.com/awesome-spark/awesome-spark/feed.json"/>
  <link rel="alternate" type="text/html" href="https://www.trackawesomelist.com/awesome-spark/awesome-spark/"/>
  <generator uri="https://github.com/bcomnes/jsonfeed-to-atom#readme" version="1.2.2">jsonfeed-to-atom</generator>
  <icon>https://www.trackawesomelist.com/favicon.ico</icon>
  <logo>https://www.trackawesomelist.com/icon.png</logo>
  <subtitle>A curated list of awesome Apache Spark packages and resources.</subtitle>
  <entry>
    <id>https://www.trackawesomelist.com/2024/10/24/</id>
    <title>Awesome Spark Updates on Oct 24, 2024</title>
    <updated>2024-10-24T12:50:13.872Z</updated>
    <published>2024-10-24T12:50:13.753Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / Language Bindings</p>
</h3>
<ul>
<li><a href="https://github.com/mdrakiburrahman/spark-connect-csharp" rel="noopener noreferrer">spark-connect-csharp (⭐1)</a> <img src="https://img.shields.io/github/last-commit/mdrakiburrahman/spark-connect-csharp.svg" /> - C# bindings.</li>
</ul>
<h3><p>Packages / General Purpose Libraries</p>
</h3>
<ul>
<li><a href="https://github.com/mrpowers-io/spark-daria" rel="noopener noreferrer">spark-daria (⭐751)</a> <img src="https://img.shields.io/github/last-commit/mrpowers-io/spark-daria.svg" /> - A Scala library with essential Spark functions and extensions to make you more productive.</li>
</ul>

<ul>
<li><a href="https://github.com/mrpowers-io/quinn" rel="noopener noreferrer">quinn (⭐632)</a> <img src="https://img.shields.io/github/last-commit/mrpowers-io/quinn.svg" /> - A native PySpark implementation of spark-daria.</li>
</ul>
<h3><p>Packages / Storage</p>
</h3>
<ul>
<li><a href="https://github.com/apache/hudi" rel="noopener noreferrer">Apache Hudi (⭐5.4k)</a> <img src="https://img.shields.io/github/last-commit/apache/hudi.svg" /> - Upserts, Deletes And Incremental Processing on Big Data..</li>
</ul>

<ul>
<li><a href="https://github.com/apache/iceberg" rel="noopener noreferrer">Apache Iceberg (⭐6.4k)</a> <img src="https://img.shields.io/github/last-commit/apache/iceberg.svg" /> - Upserts, Deletes And Incremental Processing on Big Data..</li>
</ul>
<h3><p>Packages / Data quality</p>
</h3>
<ul>
<li><a href="https://github.com/awslabs/python-deequ" rel="noopener noreferrer">python-deequ (⭐717)</a> <img src="https://img.shields.io/github/last-commit/awslabs/python-deequ.svg" /> - Python API for Deequ.</li>
</ul>
<h3><p>Packages / Testing</p>
</h3>
<ul>
<li><a href="https://github.com/mrpowers-io/spark-fast-tests" rel="noopener noreferrer">spark-fast-tests (⭐432)</a> <img src="https://img.shields.io/github/last-commit/mrpowers-io/spark-fast-tests.svg" /> - A lightweight and fast testing framework.</li>
</ul>

<ul>
<li><a href="https://github.com/MrPowers/chispa" rel="noopener noreferrer">chispa (⭐606)</a> <img src="https://img.shields.io/github/last-commit/MrPowers/chispa.svg" /> - PySpark test helpers with beautiful error messages.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2024/10/24/"/>
    <summary>8 awesome projects updated on Oct 24, 2024</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2024/10/03/</id>
    <title>Awesome Spark Updates on Oct 03, 2024</title>
    <updated>2024-10-03T12:49:25.833Z</updated>
    <published>2024-10-03T12:49:25.818Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / Language Bindings</p>
</h3>
<ul>
<li><a href="https://github.com/sjrusso8/spark-connect-rs" rel="noopener noreferrer">spark-connect-rs (⭐85)</a> <img src="https://img.shields.io/github/last-commit/sjrusso8/spark-connect-rs.svg" /> - Rust bindings.</li>
</ul>

<ul>
<li><a href="https://github.com/apache/spark-connect-go" rel="noopener noreferrer">spark-connect-go (⭐155)</a> <img src="https://img.shields.io/github/last-commit/apache/spark-connect-go.svg" /> - Golang bindings.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2024/10/03/"/>
    <summary>2 awesome projects updated on Oct 03, 2024</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2023/04/18/</id>
    <title>Awesome Spark Updates on Apr 18, 2023</title>
    <updated>2023-04-18T12:40:56.134Z</updated>
    <published>2023-04-18T12:40:56.021Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / Middleware</p>
</h3>
<ul>
<li><a href="https://github.com/apache/kyuubi" rel="noopener noreferrer">Apache Kyuubi (⭐2.1k)</a> <img src="https://img.shields.io/github/last-commit/apache/kyuubi.svg" /> - A distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark.</li>
</ul>
<h3><p>Resources / Docker Images</p>
</h3>
<ul>
<li><a href="https://hub.docker.com/r/apache/spark" rel="noopener noreferrer">apache/spark</a> - Apache Spark Official Docker images.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2023/04/18/"/>
    <summary>2 awesome projects updated on Apr 18, 2023</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2023/02/27/</id>
    <title>Awesome Spark Updates on Feb 27, 2023</title>
    <updated>2023-02-27T12:47:39.917Z</updated>
    <published>2023-02-27T12:47:39.917Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / Language Bindings</p>
</h3>
<ul>
<li><a href="https://github.com/Kotlin/kotlin-spark-api" rel="noopener noreferrer">Kotlin for Apache Spark (⭐459)</a> <img src="https://img.shields.io/github/last-commit/Kotlin/kotlin-spark-api.svg" /> - Kotlin API bindings and extensions.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2023/02/27/"/>
    <summary>1 awesome projects updated on Feb 27, 2023</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2021/12/30/</id>
    <title>Awesome Spark Updates on Dec 30, 2021</title>
    <updated>2021-12-30T11:45:37.000Z</updated>
    <published>2021-12-30T11:45:37.000Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / GIS</p>
</h3>
<ul>
<li><a href="https://github.com/apache/incubator-sedona" rel="noopener noreferrer">Apache Sedona (⭐2k)</a> <img src="https://img.shields.io/github/last-commit/apache/incubator-sedona.svg" /> - Cluster computing system for processing large-scale spatial data.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2021/12/30/"/>
    <summary>1 awesome projects updated on Dec 30, 2021</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2021/12/20/</id>
    <title>Awesome Spark Updates on Dec 20, 2021</title>
    <updated>2021-12-20T16:23:28.000Z</updated>
    <published>2021-12-20T16:23:28.000Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / Machine Learning Extension</p>
</h3>
<ul>
<li><a href="https://mlflow.org/docs/latest/python_api/mlflow.spark.html#module-mlflow.spark" rel="noopener noreferrer">MLflow</a> <img src="https://img.shields.io/github/last-commit/mlflow/mlflow.svg" /> - Machine learning orchestration platform.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2021/12/20/"/>
    <summary>1 awesome projects updated on Dec 20, 2021</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2021/12/12/</id>
    <title>Awesome Spark Updates on Dec 12, 2021</title>
    <updated>2021-12-12T12:22:07.000Z</updated>
    <published>2021-12-12T12:22:07.000Z</published>
    <content type="html"><![CDATA[<h3><p>Resources / Docker Images</p>
</h3>
<ul>
<li><a href="https://hub.docker.com/r/datamechanics/spark" rel="noopener noreferrer">datamechanics/spark</a> - An easy to setup Docker image for Apache Spark from <a href="https://www.datamechanics.co/" rel="noopener noreferrer">Data Mechanics</a>.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2021/12/12/"/>
    <summary>1 awesome projects updated on Dec 12, 2021</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2021/12/06/</id>
    <title>Awesome Spark Updates on Dec 06, 2021</title>
    <updated>2021-12-06T15:04:41.000Z</updated>
    <published>2021-12-06T09:58:50.000Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / General Purpose Libraries</p>
</h3>
<ul>
<li><a href="https://github.com/joblib/joblib-spark" rel="noopener noreferrer">Joblib Apache Spark Backend (⭐241)</a> <img src="https://img.shields.io/github/last-commit/joblib/joblib-spark.svg" /> - <a href="https://github.com/joblib/joblib" rel="noopener noreferrer"><code>joblib</code></a> backend for running tasks on Spark clusters.</li>
</ul>
<h3><p>Packages / Storage</p>
</h3>
<ul>
<li><a href="https://docs.lakefs.io/integrations/spark.html" rel="noopener noreferrer">lakeFS</a> <img src="https://img.shields.io/github/last-commit/treeverse/lakefs.svg" /> - Integration with the lakeFS atomic versioned storage layer.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2021/12/06/"/>
    <summary>2 awesome projects updated on Dec 06, 2021</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2021/11/29/</id>
    <title>Awesome Spark Updates on Nov 29, 2021</title>
    <updated>2021-11-29T20:17:17.000Z</updated>
    <published>2021-11-29T20:17:17.000Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / Notebooks and IDEs</p>
</h3>
<ul>
<li><a href="https://polynote.org/" rel="noopener noreferrer">Polynote</a>  <img src="https://img.shields.io/github/last-commit/polynote/polynote.svg" /> - Polynote: an IDE-inspired polyglot notebook. It supports mixing multiple languages in one notebook, and sharing data between them seamlessly. It encourages reproducible notebooks with its immutable data model. Originating from <a href="https://medium.com/netflix-techblog/open-sourcing-polynote-an-ide-inspired-polyglot-notebook-7f929d3f447" rel="noopener noreferrer">Netflix</a>.</li>
</ul>
<h3><p>Packages / Machine Learning Extension</p>
</h3>
<ul>
<li><a href="https://github.com/Azure/mmlspark" rel="noopener noreferrer">Microsoft ML for Apache Spark (⭐5.1k)</a> <img src="https://img.shields.io/github/last-commit/Azure/mmlspark.svg" /> - A distributed ml library with support for LightGBM, Vowpal Wabbit, OpenCV, Deep Learning, Cognitive Services, and Model Deployment.</li>
</ul>
<h3><p>Packages / Natural Language Processing</p>
</h3>
<ul>
<li><a href="https://github.com/JohnSnowLabs/spark-nlp" rel="noopener noreferrer">spark-nlp (⭐3.9k)</a> <img src="https://img.shields.io/github/last-commit/JohnSnowLabs/spark-nlp.svg" /> - Natural language processing library built on top of Apache Spark ML.</li>
</ul>
<h3><p>Resources / Papers</p>
</h3>
<ul>
<li><a href="https://cs.stanford.edu/~matei/papers/2018/sigmod_structured_streaming.pdf" rel="noopener noreferrer">Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark</a> - Structured Streaming is a new high-level streaming API, it is a declarative API based on automatically incrementalizing a static relational query.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2021/11/29/"/>
    <summary>4 awesome projects updated on Nov 29, 2021</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2021/08/18/</id>
    <title>Awesome Spark Updates on Aug 18, 2021</title>
    <updated>2021-08-18T19:34:17.000Z</updated>
    <published>2021-08-18T19:34:17.000Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / General Purpose Libraries</p>
</h3>
<ul>
<li><a href="https://github.com/apache/datafu/tree/master/datafu-spark" rel="noopener noreferrer">Apache DataFu (⭐115)</a> <img src="https://img.shields.io/github/last-commit/apache/datafu.svg" /> - A library of general purpose functions and UDF's.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2021/08/18/"/>
    <summary>1 awesome projects updated on Aug 18, 2021</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2021/08/15/</id>
    <title>Awesome Spark Updates on Aug 15, 2021</title>
    <updated>2021-08-15T18:23:58.000Z</updated>
    <published>2021-08-15T18:23:58.000Z</published>
    <content type="html"><![CDATA[<h3><p>Resources / Books</p>
</h3>
<ul>
<li><a href="https://www.oreilly.com/library/view/learning-spark-2nd/9781492050032/" rel="noopener noreferrer">Learning Spark, 2nd Edition</a> - Introduction to Spark API with Spark 3.0 covered. Good source of knowledge about basic concepts.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2021/08/15/"/>
    <summary>1 awesome projects updated on Aug 15, 2021</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2021/03/17/</id>
    <title>Awesome Spark Updates on Mar 17, 2021</title>
    <updated>2021-03-17T11:09:06.000Z</updated>
    <published>2021-03-17T11:09:06.000Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / Monitoring</p>
</h3>
<ul>
<li><a href="https://github.com/datamechanics/delight" rel="noopener noreferrer">Data Mechanics Delight (⭐342)</a> <img src="https://img.shields.io/github/last-commit/datamechanics/delight.svg" /> - Cross-platform monitoring tool (Spark UI / Spark History Server replacement).</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2021/03/17/"/>
    <summary>1 awesome projects updated on Mar 17, 2021</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2021/02/19/</id>
    <title>Awesome Spark Updates on Feb 19, 2021</title>
    <updated>2021-02-19T01:54:19.000Z</updated>
    <published>2021-02-19T01:54:19.000Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / General Purpose Libraries</p>
</h3>
<ul>
<li><a href="https://github.com/yaooqinn/itachi" rel="noopener noreferrer">itachi (⭐56)</a> <img src="https://img.shields.io/github/last-commit/yaooqinn/itachi.svg" /> - A library that brings useful functions from modern database management systems to Apache Spark.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2021/02/19/"/>
    <summary>1 awesome projects updated on Feb 19, 2021</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2020/11/02/</id>
    <title>Awesome Spark Updates on Nov 02, 2020</title>
    <updated>2020-11-02T15:53:44.000Z</updated>
    <published>2020-11-02T15:53:44.000Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / Storage</p>
</h3>
<ul>
<li><a href="https://github.com/delta-io/delta" rel="noopener noreferrer">Delta Lake (⭐7.5k)</a> <img src="https://img.shields.io/github/last-commit/delta-io/delta.svg" /> - Storage layer with ACID transactions.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2020/11/02/"/>
    <summary>1 awesome projects updated on Nov 02, 2020</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2020/10/09/</id>
    <title>Awesome Spark Updates on Oct 09, 2020</title>
    <updated>2020-10-09T07:48:19.000Z</updated>
    <published>2020-10-09T07:48:19.000Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / Interfaces</p>
</h3>
<ul>
<li><a href="https://github.com/databricks/koalas" rel="noopener noreferrer">Koalas (⭐3.3k)</a> <img src="https://img.shields.io/github/last-commit/databricks/koalas.svg" /> - Pandas DataFrame API on top of Apache Spark.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2020/10/09/"/>
    <summary>1 awesome projects updated on Oct 09, 2020</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2020/10/01/</id>
    <title>Awesome Spark Updates on Oct 01, 2020</title>
    <updated>2020-10-01T20:01:28.000Z</updated>
    <published>2020-10-01T20:01:28.000Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / Language Bindings</p>
</h3>
<ul>
<li><a href="https://github.com/dotnet/spark" rel="noopener noreferrer">.NET for Apache Spark (⭐2k)</a> <img src="https://img.shields.io/github/last-commit/dotnet/spark.svg" /> - .NET bindings.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2020/10/01/"/>
    <summary>1 awesome projects updated on Oct 01, 2020</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2020/09/23/</id>
    <title>Awesome Spark Updates on Sep 23, 2020</title>
    <updated>2020-09-23T13:48:51.000Z</updated>
    <published>2020-09-23T13:48:51.000Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / Utilities</p>
</h3>
<ul>
<li><a href="https://github.com/ironmussa/Optimus/" rel="noopener noreferrer">Optimus (⭐1.5k)</a> <img src="https://img.shields.io/github/last-commit/ironmussa/Optimus.svg" /> - Data Cleansing and Exploration utilities with the goal of simplifying data cleaning.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2020/09/23/"/>
    <summary>1 awesome projects updated on Sep 23, 2020</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2020/09/22/</id>
    <title>Awesome Spark Updates on Sep 22, 2020</title>
    <updated>2020-09-22T19:03:30.000Z</updated>
    <published>2020-09-22T19:03:30.000Z</published>
    <content type="html"><![CDATA[<h3><p>Resources / Papers</p>
</h3>
<ul>
<li><a href="https://arxiv.org/pdf/2009.08044.pdf" rel="noopener noreferrer">Large-Scale Intelligent Microservices</a> - Microsoft paper that presents an Apache Spark-based micro-service orchestration framework that extends database operations to include web service primitives.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2020/09/22/"/>
    <summary>1 awesome projects updated on Sep 22, 2020</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2020/07/16/</id>
    <title>Awesome Spark Updates on Jul 16, 2020</title>
    <updated>2020-07-16T20:26:28.000Z</updated>
    <published>2020-07-16T20:26:28.000Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / Middleware</p>
</h3>
<ul>
<li><a href="https://github.com/apache/incubator-livy" rel="noopener noreferrer">Livy (⭐883)</a> <img src="https://img.shields.io/github/last-commit/apache/incubator-livy.svg" /> - REST server with extensive language support (Python, R, Scala), ability to maintain interactive sessions and object sharing.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2020/07/16/"/>
    <summary>1 awesome projects updated on Jul 16, 2020</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2019/12/13/</id>
    <title>Awesome Spark Updates on Dec 13, 2019</title>
    <updated>2019-12-13T15:01:24.000Z</updated>
    <published>2019-12-13T15:01:24.000Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / Data quality</p>
</h3>
<ul>
<li><a href="https://github.com/awslabs/deequ" rel="noopener noreferrer">deequ (⭐3.3k)</a> <img src="https://img.shields.io/github/last-commit/awslabs/deequ.svg" /> - Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2019/12/13/"/>
    <summary>1 awesome projects updated on Dec 13, 2019</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2019/11/28/</id>
    <title>Awesome Spark Updates on Nov 28, 2019</title>
    <updated>2019-11-28T11:22:02.000Z</updated>
    <published>2019-11-28T11:22:02.000Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / Notebooks and IDEs</p>
</h3>
<ul>
<li><a href="https://almond.sh/" rel="noopener noreferrer">almond</a> <img src="https://img.shields.io/github/last-commit/almond-sh/almond.svg" /> - A scala kernel for <a href="https://jupyter.org/" rel="noopener noreferrer">Jupyter</a>.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2019/11/28/"/>
    <summary>1 awesome projects updated on Nov 28, 2019</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2019/01/26/</id>
    <title>Awesome Spark Updates on Jan 26, 2019</title>
    <updated>2019-01-26T04:18:31.000Z</updated>
    <published>2019-01-26T04:18:31.000Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / Web Archives</p>
</h3>
<ul>
<li><a href="https://github.com/archivesunleashed/aut" rel="noopener noreferrer">Archives Unleashed Toolkit (⭐137)</a> <img src="https://img.shields.io/github/last-commit/archivesunleashed/aut.svg" /> -  Open-source toolkit for analyzing web archives.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2019/01/26/"/>
    <summary>1 awesome projects updated on Jan 26, 2019</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2018/08/14/</id>
    <title>Awesome Spark Updates on Aug 14, 2018</title>
    <updated>2018-08-14T15:51:12.000Z</updated>
    <published>2018-08-14T15:51:12.000Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / Language Bindings</p>
</h3>
<ul>
<li><a href="https://github.com/rstudio/sparklyr" rel="noopener noreferrer">sparklyr (⭐952)</a> <img src="https://img.shields.io/github/last-commit/rstudio/sparklyr.svg" /> - An alternative R backend, using <a href="https://github.com/hadley/dplyr" rel="noopener noreferrer"><code>dplyr</code></a>.</li>
</ul>

<ul>
<li><a href="https://github.com/tweag/sparkle" rel="noopener noreferrer">sparkle (⭐447)</a> <img src="https://img.shields.io/github/last-commit/tweag/sparkle.svg" /> - Haskell on Apache Spark.</li>
</ul>
<h3><p>Packages / Notebooks and IDEs</p>
</h3>
<ul>
<li><a href="https://zeppelin.incubator.apache.org/" rel="noopener noreferrer">Apache Zeppelin</a> <img src="https://img.shields.io/github/last-commit/apache/zeppelin.svg" /> - Web-based notebook that enables interactive data analytics with plugable backends, integrated plotting, and extensive Spark support out-of-the-box.</li>
</ul>

<ul>
<li><a href="https://github.com/jupyter-incubator/sparkmagic" rel="noopener noreferrer">sparkmagic (⭐1.3k)</a> <img src="https://img.shields.io/github/last-commit/jupyter-incubator/sparkmagic.svg" /> - <a href="https://jupyter.org/" rel="noopener noreferrer">Jupyter</a> magics and kernels for working with remote Spark clusters, for interactively working with remote Spark clusters through <a href="https://github.com/cloudera/livy" rel="noopener noreferrer">Livy (⭐1k)</a>, in Jupyter notebooks.</li>
</ul>
<h3><p>Packages / SQL Data Sources</p>
</h3>
<ul>
<li><a href="https://github.com/databricks/spark-xml" rel="noopener noreferrer">Spark XML (⭐504)</a> <img src="https://img.shields.io/github/last-commit/databricks/spark-xml.svg" /> - XML parser and writer.</li>
</ul>

<ul>
<li><a href="https://github.com/datastax/spark-cassandra-connector" rel="noopener noreferrer">Spark Cassandra Connector (⭐1.9k)</a> <img src="https://img.shields.io/github/last-commit/datastax/spark-cassandra-connector.svg" /> - Cassandra support including data source and API and support for arbitrary queries.</li>
</ul>

<ul>
<li><a href="https://github.com/mongodb/mongo-spark" rel="noopener noreferrer">Mongo-Spark (⭐710)</a> <img src="https://img.shields.io/github/last-commit/mongodb/mongo-spark.svg" /> - Official MongoDB connector.</li>
</ul>
<h3><p>Packages / Bioinformatics</p>
</h3>
<ul>
<li><a href="https://github.com/bigdatagenomics/adam" rel="noopener noreferrer">ADAM (⭐1k)</a> <img src="https://img.shields.io/github/last-commit/bigdatagenomics/adam.svg" /> - Set of tools designed to analyse genomics data.</li>
</ul>

<ul>
<li><a href="https://github.com/hail-is/hail" rel="noopener noreferrer">Hail (⭐976)</a> <img src="https://img.shields.io/github/last-commit/hail-is/hail.svg" /> - Genetic analysis framework.</li>
</ul>
<h3><p>Packages / Graph Processing</p>
</h3>
<ul>
<li><a href="https://github.com/graphframes/graphframes" rel="noopener noreferrer">GraphFrames (⭐997)</a> <img src="https://img.shields.io/github/last-commit/graphframes/graphframes.svg" /> - Data frame based graph API.</li>
</ul>

<ul>
<li><a href="https://github.com/neo4j-contrib/neo4j-spark-connector" rel="noopener noreferrer">neo4j-spark-connector (⭐313)</a> <img src="https://img.shields.io/github/last-commit/neo4j-contrib/neo4j-spark-connector.svg" /> - Bolt protocol based, Neo4j Connector with RDD, DataFrame and GraphX / GraphFrames support.</li>
</ul>
<h3><p>Packages / Machine Learning Extension</p>
</h3>
<ul>
<li><a href="https://systemml.apache.org/" rel="noopener noreferrer">Apache SystemML</a> <img src="https://img.shields.io/github/last-commit/apache/systemml.svg" /> - Declarative machine learning framework on top of Spark.</li>
</ul>

<ul>
<li><a href="https://mahout.apache.org/users/sparkbindings/home.html" rel="noopener noreferrer">Mahout Spark Bindings</a> [status unknown] - linear algebra DSL and optimizer with R-like syntax.</li>
</ul>

<ul>
<li><a href="https://github.com/jpmml/jpmml-spark" rel="noopener noreferrer">JPMML-Spark (⭐94)</a> <img src="https://img.shields.io/github/last-commit/jpmml/jpmml-spark.svg" /> - PMML transformer library for Spark ML.</li>
</ul>

<ul>
<li><a href="https://mitdbg.github.io/modeldb" rel="noopener noreferrer">ModelDB</a> <img src="https://img.shields.io/github/last-commit/mitdbg/modeldb.svg" /> - A system to manage machine learning models for <code>spark.ml</code> and <a href="https://github.com/scikit-learn/scikit-learn" rel="noopener noreferrer"><code>scikit-learn</code></a> <img src="https://img.shields.io/github/last-commit/scikit-learn/scikit-learn.svg" />.</li>
</ul>

<ul>
<li><a href="https://github.com/h2oai/sparkling-water" rel="noopener noreferrer">Sparkling Water (⭐965)</a> <img src="https://img.shields.io/github/last-commit/h2oai/sparkling-water.svg" /> -  <a href="http://www.h2o.ai/" rel="noopener noreferrer">H2O</a> interoperability layer.</li>
</ul>

<ul>
<li><a href="https://github.com/intel-analytics/BigDL" rel="noopener noreferrer">BigDL (⭐6.6k)</a> <img src="https://img.shields.io/github/last-commit/intel-analytics/BigDL.svg" /> - Distributed Deep Learning library.</li>
</ul>

<ul>
<li><a href="https://github.com/combust/mleap" rel="noopener noreferrer">MLeap (⭐1.5k)</a> <img src="https://img.shields.io/github/last-commit/combust/mleap.svg" /> - Execution engine and serialization format which supports deployment of <code>o.a.s.ml</code> models without dependency on <code>SparkSession</code>.</li>
</ul>
<h3><p>Packages / Middleware</p>
</h3>
<ul>
<li><a href="https://github.com/spark-jobserver/spark-jobserver" rel="noopener noreferrer">spark-jobserver (⭐2.8k)</a> <img src="https://img.shields.io/github/last-commit/spark-jobserver/spark-jobserver.svg" /> - Simple Spark as a Service which supports objects sharing using so called named objects. JVM only.</li>
</ul>

<ul>
<li><a href="https://github.com/apache/incubator-toree" rel="noopener noreferrer">Apache Toree (⭐739)</a> <img src="https://img.shields.io/github/last-commit/apache/incubator-toree.svg" /> - IPython protocol based middleware for interactive applications.</li>
</ul>
<h3><p>Packages / Utilities</p>
</h3>
<ul>
<li><a href="https://github.com/Tubular/sparkly" rel="noopener noreferrer">sparkly (⭐60)</a> <img src="https://img.shields.io/github/last-commit/Tubular/sparkly.svg" /> - Helpers &amp; syntactic sugar for PySpark.</li>
</ul>

<ul>
<li><a href="https://github.com/nchammas/flintrock" rel="noopener noreferrer">Flintrock (⭐638)</a> <img src="https://img.shields.io/github/last-commit/nchammas/flintrock.svg" /> - A command-line tool for launching Spark clusters on EC2.</li>
</ul>
<h3><p>Packages / Streaming</p>
</h3>
<ul>
<li><a href="https://bahir.apache.org/" rel="noopener noreferrer">Apache Bahir</a> <img src="https://img.shields.io/github/last-commit/apache/bahir.svg" /> - Collection of the streaming connectors excluded from Spark 2.0 (Akka, MQTT, Twitter. ZeroMQ).</li>
</ul>
<h3><p>Packages / Interfaces</p>
</h3>
<ul>
<li><a href="https://beam.apache.org/" rel="noopener noreferrer">Apache Beam</a> <img src="https://img.shields.io/github/last-commit/apache/beam.svg" /> - Unified data processing engine supporting both batch and streaming applications. Apache Spark is one of the supported execution environments.</li>
</ul>
<h3><p>Packages / Testing</p>
</h3>
<ul>
<li><a href="https://github.com/holdenk/spark-testing-base" rel="noopener noreferrer">spark-testing-base (⭐1.5k)</a> <img src="https://img.shields.io/github/last-commit/holdenk/spark-testing-base.svg" /> - Collection of base test classes.</li>
</ul>
<h3><p>Packages / Workflow Management</p>
</h3>
<ul>
<li><a href="https://github.com/broadinstitute/cromwell#spark-backend" rel="noopener noreferrer">Cromwell (⭐993)</a> <img src="https://img.shields.io/github/last-commit/broadinstitute/cromwell.svg" /> - Workflow management system with <a href="https://github.com/broadinstitute/cromwell#spark-backend" rel="noopener noreferrer">Spark backend (⭐993)</a>.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2018/08/14/"/>
    <summary>26 awesome projects updated on Aug 14, 2018</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2017/04/12/</id>
    <title>Awesome Spark Updates on Apr 12, 2017</title>
    <updated>2017-04-12T17:56:55.000Z</updated>
    <published>2017-04-12T17:56:55.000Z</published>
    <content type="html"><![CDATA[<h3><p>Resources / Papers</p>
</h3>
<ul>
<li><a href="https://amplab.cs.berkeley.edu/wp-content/uploads/2015/03/SparkSQLSigmod2015.pdf" rel="noopener noreferrer">Spark SQL: Relational Data Processing in Spark</a> - Paper introducing relational underpinnings, code generation and Catalyst optimizer.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2017/04/12/"/>
    <summary>1 awesome projects updated on Apr 12, 2017</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2017/04/05/</id>
    <title>Awesome Spark Updates on Apr 05, 2017</title>
    <updated>2017-04-05T10:11:51.000Z</updated>
    <published>2017-04-05T10:11:51.000Z</published>
    <content type="html"><![CDATA[<h3><p>Resources / Papers</p>
</h3>
<ul>
<li><a href="https://people.csail.mit.edu/matei/papers/2012/nsdi_spark.pdf" rel="noopener noreferrer">Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing</a> - Paper introducing a core distributed memory abstraction.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2017/04/05/"/>
    <summary>1 awesome projects updated on Apr 05, 2017</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2017/03/02/</id>
    <title>Awesome Spark Updates on Mar 02, 2017</title>
    <updated>2017-03-02T01:01:20.000Z</updated>
    <published>2017-03-02T01:01:20.000Z</published>
    <content type="html"><![CDATA[<h3><p>Resources / Books</p>
</h3>
<ul>
<li><a href="http://shop.oreilly.com/product/0636920035091.do" rel="noopener noreferrer">Advanced Analytics with Spark</a> - Useful collection of Spark processing patterns. Accompanying GitHub repository: <a href="https://github.com/sryza/aas" rel="noopener noreferrer">sryza/aas (⭐1.5k)</a>.</li>
</ul>

<ul>
<li><a href="https://jaceklaskowski.gitbooks.io/mastering-apache-spark/" rel="noopener noreferrer">Mastering Apache Spark</a> - Interesting compilation of notes by <a href="https://github.com/jaceklaskowski" rel="noopener noreferrer">Jacek Laskowski</a>. Focused on different aspects of Spark internals.</li>
</ul>

<ul>
<li><a href="https://www.manning.com/books/spark-in-action" rel="noopener noreferrer">Spark in Action</a> - New book in the Manning's "in action" family with +400 pages. Starts gently, step-by-step and covers large number of topics. Free excerpt on how to <a href="http://freecontent.manning.com/how-to-start-developing-spark-applications-in-eclipse/" rel="noopener noreferrer">setup Eclipse for Spark application development</a> and how to bootstrap a new application using the provided Maven Archetype. You can find the accompanying GitHub repo <a href="https://github.com/spark-in-action/first-edition" rel="noopener noreferrer">here (⭐273)</a>.</li>
</ul>
<h3><p>Resources / MOOCS</p>
</h3>
<ul>
<li><a href="https://www.edx.org/xseries/data-science-engineering-apache-spark" rel="noopener noreferrer">Data Science and Engineering with Apache Spark (edX XSeries)</a> - Series of five courses (<a href="https://www.edx.org/course/introduction-apache-spark-uc-berkeleyx-cs105x" rel="noopener noreferrer">Introduction to Apache Spark</a>, <a href="https://www.edx.org/course/distributed-machine-learning-apache-uc-berkeleyx-cs120x" rel="noopener noreferrer">Distributed Machine Learning with Apache Spark</a>, <a href="https://www.edx.org/course/big-data-analysis-apache-spark-uc-berkeleyx-cs110x" rel="noopener noreferrer">Big Data Analysis with Apache Spark</a>, <a href="https://www.edx.org/course/advanced-apache-spark-data-science-data-uc-berkeleyx-cs115x" rel="noopener noreferrer">Advanced Apache Spark for Data Science and Data Engineering</a>, <a href="https://www.edx.org/course/advanced-distributed-machine-learning-uc-berkeleyx-cs125x" rel="noopener noreferrer">Advanced Distributed Machine Learning with Apache Spark</a>) covering different aspects of software engineering and data science. Python oriented.</li>
</ul>
<h3><p>Resources / Workshops</p>
</h3>
<ul>
<li><a href="http://ampcamp.berkeley.edu" rel="noopener noreferrer">AMP Camp</a> - Periodical training event organized by the <a href="https://amplab.cs.berkeley.edu/" rel="noopener noreferrer">UC Berkeley AMPLab</a>. A source of useful exercise and recorded workshops covering different tools from the <a href="https://amplab.cs.berkeley.edu/software/" rel="noopener noreferrer">Berkeley Data Analytics Stack</a>.</li>
</ul>
<h3><p>Resources / Projects Using Spark</p>
</h3>
<ul>
<li><a href="https://github.com/OryxProject/oryx" rel="noopener noreferrer">Oryx 2 (⭐1.8k)</a> - <a href="http://lambda-architecture.net/" rel="noopener noreferrer">Lambda architecture</a> platform built on Apache Spark and <a href="http://kafka.apache.org/" rel="noopener noreferrer">Apache Kafka</a> with specialization for real-time large scale machine learning.</li>
</ul>
<h3><p>Resources / Miscellaneous</p>
</h3>
<ul>
<li><a href="https://gitter.im/spark-scala/Lobby" rel="noopener noreferrer">Spark with Scala Gitter channel</a> - "<em>A place to discuss and ask questions about using Scala for Spark programming</em>" started by <a href="https://github.com/deanwampler" rel="noopener noreferrer">@deanwampler</a>.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2017/03/02/"/>
    <summary>7 awesome projects updated on Mar 02, 2017</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2016/10/22/</id>
    <title>Awesome Spark Updates on Oct 22, 2016</title>
    <updated>2016-10-22T20:02:43.000Z</updated>
    <published>2016-10-22T20:02:43.000Z</published>
    <content type="html"><![CDATA[<h3><p>Resources / Miscellaneous</p>
</h3>
<ul>
<li><a href="http://apache-spark-user-list.1001560.n3.nabble.com/" rel="noopener noreferrer">Apache Spark User List</a> and <a href="http://apache-spark-developers-list.1001551.n3.nabble.com/" rel="noopener noreferrer">Apache Spark Developers List</a> - Mailing lists dedicated to usage questions and development topics respectively.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2016/10/22/"/>
    <summary>1 awesome projects updated on Oct 22, 2016</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2016/06/26/</id>
    <title>Awesome Spark Updates on Jun 26, 2016</title>
    <updated>2016-06-26T21:11:24.000Z</updated>
    <published>2016-06-26T21:11:24.000Z</published>
    <content type="html"><![CDATA[<h3><p>Resources / Projects Using Spark</p>
</h3>
<ul>
<li><a href="https://github.com/Stratio/Crossdata" rel="noopener noreferrer">Crossdata (⭐169)</a> - Data integration platform with extended DataSource API and multi-user environment.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2016/06/26/"/>
    <summary>1 awesome projects updated on Jun 26, 2016</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2016/06/22/</id>
    <title>Awesome Spark Updates on Jun 22, 2016</title>
    <updated>2016-06-22T22:20:46.000Z</updated>
    <published>2016-06-22T06:27:13.000Z</published>
    <content type="html"><![CDATA[<h3><p>Resources / Projects Using Spark</p>
</h3>
<ul>
<li><a href="https://github.com/linkedin/photon-ml" rel="noopener noreferrer">Photon ML (⭐793)</a> - A machine learning library supporting classical Generalized Mixed Model and Generalized Additive Mixed Effect Model.</li>
</ul>
<h3><p>Resources / Docker Images</p>
</h3>
<ul>
<li><a href="https://github.com/sequenceiq/docker-spark" rel="noopener noreferrer">sequenceiq/docker-spark (⭐765)</a> - Yarn images from <a href="http://www.sequenceiq.com/" rel="noopener noreferrer">SequenceIQ</a>.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2016/06/22/"/>
    <summary>2 awesome projects updated on Jun 22, 2016</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2016/06/21/</id>
    <title>Awesome Spark Updates on Jun 21, 2016</title>
    <updated>2016-06-21T00:52:11.000Z</updated>
    <published>2016-06-21T00:52:11.000Z</published>
    <content type="html"><![CDATA[<h3><p>Resources / Docker Images</p>
</h3>
<ul>
<li><a href="https://github.com/jupyter/docker-stacks/tree/master/pyspark-notebook" rel="noopener noreferrer">jupyter/docker-stacks/pyspark-notebook (⭐8k)</a> - PySpark with Jupyter Notebook and Mesos client.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2016/06/21/"/>
    <summary>1 awesome projects updated on Jun 21, 2016</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2016/06/11/</id>
    <title>Awesome Spark Updates on Jun 11, 2016</title>
    <updated>2016-06-11T21:01:59.000Z</updated>
    <published>2016-06-11T21:01:59.000Z</published>
    <content type="html"><![CDATA[<h3><p>Resources / MOOCS</p>
</h3>
<ul>
<li><a href="https://www.coursera.org/learn/big-data-analysys" rel="noopener noreferrer">Big Data Analysis with Scala and Spark (Coursera)</a> - Scala oriented introductory course. Part of <a href="https://www.coursera.org/specializations/scala" rel="noopener noreferrer">Functional Programming in Scala Specialization</a>.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2016/06/11/"/>
    <summary>1 awesome projects updated on Jun 11, 2016</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2016/06/10/</id>
    <title>Awesome Spark Updates on Jun 10, 2016</title>
    <updated>2016-06-10T10:43:11.000Z</updated>
    <published>2016-06-10T10:43:11.000Z</published>
    <content type="html"><![CDATA[<h3><p>Packages / Machine Learning Extension</p>
</h3>
<ul>
<li><a href="http://keystone-ml.org/" rel="noopener noreferrer">KeystoneML</a> - Type safe machine learning pipelines with RDDs.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2016/06/10/"/>
    <summary>1 awesome projects updated on Jun 10, 2016</summary>
  </entry>
  <entry>
    <id>https://www.trackawesomelist.com/2016/02/02/</id>
    <title>Awesome Spark Updates on Feb 02, 2016</title>
    <updated>2016-02-02T07:10:06.000Z</updated>
    <published>2016-02-02T07:10:06.000Z</published>
    <content type="html"><![CDATA[<h3><p>Resources / Projects Using Spark</p>
</h3>
<ul>
<li><a href="https://prediction.io/" rel="noopener noreferrer">PredictionIO</a> - Machine Learning server for developers and data scientists to build and deploy predictive applications in a fraction of the time.</li>
</ul>
]]></content>
    <link rel="alternate" href="https://www.trackawesomelist.com/2016/02/02/"/>
    <summary>1 awesome projects updated on Feb 02, 2016</summary>
  </entry>
</feed>