Big Data 5 months ago 3.8k
A curated list of data engineering tools for software developers
Data Lake Management
- BottledWater Change data capture from PostgreSQL into Kafka. Deprecated.
- kafkat Simplified command-line administration for Kafka brokers
- kafkacat Generic command line non-JVM Apache Kafka producer and consumer
- pg-kafka A PostgreSQL extension to produce messages to Apache Kafka
- librdkafka The Apache Kafka C/C++ library
- kafka-docker Kafka in Docker
- kafka-manager A tool for managing Apache Kafka
- kafka-node Node.js client for Apache Kafka 0.8
- Secor Pinterest's Kafka to S3 distributed consumer
- Kafka-logger Kafka-winston logger for nodejs from uber
- Snakebite A pure python HDFS client
- smart_open Utils for streaming large files (S3, HDFS, gzip, bz2)
Charts and Dashboards
- D3Plus D3's simplier, easier to use cousin. Mostly predefined templates that you can just plug data in.
- Tarantool Tarantool is an in-memory database and application server.
- GreenPlum The Greenplum Database (GPDB) is an advanced, fully featured, open source data warehouse. It provides powerful and rapid analytics on petabyte scale data volumes.
- cayley An open-source graph database. Google.
- Snappydata SnappyData: OLTP + OLAP Database built on Apache Spark
- TimescaleDB: Built as an extension on top of PostgreSQL, TimescaleDB is a time-series SQL database providing fast analytics, scalability, with automated data management on a proven storage engine.
Charts and Dashboards
ELK Elastic Logstash Kibana
- InfluxDB Scalable datastore for metrics, events, and real-time analytics.
- OpenTSDB A scalable, distributed Time Series Database.
- QuestDB A relational column-oriented database designed for real-time analytics on time series and event data.
- kairosdb Fast scalable time series database.
- Heroic A scalable time series database based on Cassandra and Elasticsearch, by Spotify
- Druid Column oriented distributed data store ideal for powering interactive applications
- Riak-TS Riak TS is the only enterprise-grade NoSQL time series database optimized specifically for IoT and Time Series data
- Akumuli Akumuli is a numeric time-series database. It can be used to capture, store and process time-series data in real-time. The word "akumuli" can be translated from esperanto as "accumulate".
- Rhombus A time-series object store for Cassandra that handles all the complexity of building wide row indexes.
- Dalmatiner DB Fast distributed metrics database
- Blueflood A distributed system designed to ingest and process time series data
- Timely Timely is a time series database application that provides secure access to time series data based on Accumulo and Grafana.
- RQLite Replicated SQLite using the Raft consensus protocol
- MySQL The world's most popular open source database.
- MariaDB An enhanced, drop-in replacement for MySQL.
- PostgreSQL The world's most advanced open source database.
- Amazon RDS Amazon RDS makes it easy to set up, operate, and scale a relational database in the cloud.
- Crate.IO Scalable SQL database with the NOSQL goodies.
- Redis An open source, BSD licensed, advanced key-value cache and store.
- Riak A distributed database designed to deliver maximum data availability by distributing data across multiple servers.
- AWS DynamoDB A fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale.
- HyperDex HyperDex is a scalable, searchable key-value store. Deprecated.
- SSDB A high performance NoSQL database supporting many data structures, an alternative to Redis
- Kyoto Tycoon Kyoto Tycoon is a lightweight network server on top of the Kyoto Cabinet key-value database, built for high-performance and concurrency
- IonDB A key-value store for microcontroller and IoT applications
- Cassandra The right choice when you need scalability and high availability without compromising performance.
- Cassandra Calculator This simple form allows you to try out different values for your Apache Cassandra cluster and see what the impact is for your application.
- CCM A script to easily create and destroy an Apache Cassandra cluster on localhost
- ScyllaDB NoSQL data store using the seastar framework, compatible with Apache Cassandra https://www.scylladb.com/
- HBase The Hadoop database, a distributed, scalable, big data store.
- AWS Redshift A fast, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools.
- FiloDB Distributed. Columnar. Versioned. Streaming. SQL.
- Vertica Distributed, MPP columnar database with extensive analytics SQL.
- ClickHouse Distributed columnar DBMS for OLAP. SQL.
- MongoDB An open-source, document database designed for ease of development and scaling.
- Elasticsearch Search & Analyze Data in Real Time.
- Couchbase The highest performing NoSQL distributed database.
- RethinkDB The open-source database for the realtime web.
- RavenDB Fully Transactional NoSQL Document Database.
- Neo4j The world’s leading graph database.
- OrientDB 2nd Generation Distributed Graph Database with the flexibility of Documents in one product with an Open Source commercial friendly license.
- ArangoDB A distributed free and open-source database with a flexible data model for documents, graphs, and key-values.
- Titan A scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster.
- FlockDB A distributed, fault-tolerant graph database by Twitter. Deprecated.
Last Checked At: 2021-10-25T04:08:41.296Z