Awesome List Updates on May 19, 2026
18 awesome lists updated today.
🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor
1. Awesome Polars
Tutorials & workshops / Miscellaneous
- A tidyverse user's guide to Polars - A practical cookbook for R/tidyverse users learning the Polars DataFrame library for Python. Each section pairs Polars syntax with its dplyr/tidyr/lubridate equivalent so the concepts map directly onto what you already know.
2. Awesome Math
Foundations of Mathematics / Logic
- 📝 forall x: Calgary (An Introduction to Formal Logic) - P.D. Magnus and Tim Button, remixed by Aaron Thomas-Bolduc and Richard Zach (Open Logic Project)
3. Awesome Cl
Java
- FOIL (⭐0) - Rich Hickey's Foreign Object Interface for Lisp to access the JVM and the CLI/CLR.
4. Awesome Cpp
Database
- SlothDB (⭐527) - an embedded SQL database that runs everywhere: on your laptop, on a server, and in the browser. [MIT] website
5. Awesome Rust
Applications / Embedded
- infinition/waveshare-watch-rs (⭐308) - 100% Rust
no_stdsmartwatch firmware for Waveshare ESP32-S3-Touch-AMOLED-2.06. Features QSPI 80 MHz DMA display, Embassy async runtime, event-driven power management with Always-On Display.
Libraries / Peripherals
- esp-rs/esp-hal (⭐1.9k) [esp-hal] - Bare-metal
no_stdhardware abstraction layer for Espressif ESP32 devices (ESP32, ESP32-C2/C3/C5/C6/C61, ESP32-H2, ESP32-P4, ESP32-S2/S3). Provides safe Rust APIs for GPIO, I2C, SPI, UART, timers, DMA, and more.
6. Awesome Magento2
Open Source Extensions / CMS
- magento-2-seeder (⭐35) - Laravel-style database seeder for Magento 2 / Mage-OS. Generate realistic products (all types), categories, customers, orders (all states), CMS pages, and reviews via bin/magento db:seed.
7. Awesome Mqtt
Monitoring / Firmwares for ESP based Devices
- ccusage-mqtt (⭐0) - Publishes Claude Code (Anthropic's AI coding agent) usage telemetry to MQTT with Home Assistant auto-discovery. 15 sensors, mood classifier.
8. Awesome Research
Datasets / Social Sciences
- Voidly Censorship Index (
Lookup,API): Open dataset of global internet censorship — 19.6M live OONI samples, 1.6M historical records across 119+ countries, 5,356 citable incidents with 16,822 evidence items. CC BY 4.0, REST/MCP API, HuggingFace parquet exports.
Investigate Papers / HTML+CSS+JS
- citracer (⭐20): Trace citation chains for any concept across research papers. Given a source PDF (or arXiv/DOI), recursively walks the citation graph and produces an interactive HTML visualization. Supports forward and reverse tracing.
- BGPT MCP API: Search scientific papers with structured full-text experimental data (methods, results, conclusions, sample sizes, limitations, quality scores — 25+ fields per paper). Works via the Model Context Protocol with Claude, Cursor, and other AI tools. Free tier: 50 searches without authentication. Repository (⭐20). MCP registry:
io.github.connerlambden/bgpt-mcp.
9. Awesome Developer First
Feature Flags
- ConfigCat - Powerful, privacy-first feature flag management with unlimited team size and a forever free plan.
10. Awesome Bigdata
Frameworks
- Numaflow (⭐2.5k) - Kubernetes-native stream processing platform.
Distributed Programming
- Apache Gearpump - real-time big data streaming engine based on Akka.
Distributed Filesystem
- JuiceFS (⭐14k) - distributed POSIX file system built on object storage.
Graph Data Model
- AgensGraph (⭐1.5k) - transactional graph database based on PostgreSQL.
- ArcadeDB - multi-model database with graph, document, key-value, time-series and vector support.
- Facebook TAO - TAO is the distributed data store that is widely used at Facebook to store and serve the social graph.
- Nebula Graph - distributed graph database for large-scale graphs with low-latency queries.
Columnar Databases
- ClickHouse - an open-source column-oriented database management system that allows generating analytical data reports in real time.
NewSQL Databases
- Pivotal GemFire XD - Low-latency, in-memory, distributed SQL data store. Provides SQL interface to in-memory table data, persistable in HDFS.
Time-Series Databases
- M3DB - a distributed time series database that can be used for storing realtime metrics at long retention.
Lakehouse Table Formats
- Apache Hudi - open data lakehouse platform and table format for high-throughput incremental data pipelines.
- Apache Iceberg - open table format for huge analytic datasets with schema evolution, hidden partitioning, and time travel.
- Apache Paimon - lake format for building real-time lakehouse architectures with Flink and Spark.
- Apache XTable - incubating Apache project for interoperability across lakehouse table formats.
- Delta Lake - open-source storage framework for building lakehouse architectures on data lakes.
SQL-like processing
- Apache Doris - real-time analytical database for high-concurrency SQL analytics, search, and warehousing.
- DuckDB - in-process analytical SQL database for local analytics over files, data lakes, and data frames.
- rawquery - managed lakehouse query service using DuckDB over Apache Iceberg tables on object storage.
- StarRocks - high-performance MPP SQL engine for real-time analytics and lakehouse queries.
- Trino - distributed SQL query engine for querying large datasets across heterogeneous data sources.
Vector Databases
- Chroma - open-source embedding database for AI applications.
- Infinity (⭐4.5k) - AI-native database for hybrid vector, sparse vector, tensor, full-text, and structured search.
- LanceDB - open-source embedded vector database built on the Lance columnar format.
- Milvus (⭐44k) - open-source vector database for scalable similarity search.
- Qdrant - vector database and similarity search engine with REST, gRPC, and client SDKs.
- Weaviate - open-source vector database for semantic search with structured filtering.
Data Ingestion
- Airbyte - open-source data movement platform for ELT pipelines and connector-based replication.
- Apache SeaTunnel - high-performance, distributed data integration platform for batch and streaming synchronization.
- Bruin (⭐1.6k) - end-to-end data pipeline tool combining ingestion, transformations, and data quality checks.
- DataRaven - managed cloud object storage transfers for data ingestion workflows.
- DBConvert Streams - self-hosted CDC replication and database migration tool.
- Debezium - open-source distributed platform for change data capture.
- Flink CDC - streaming data integration tool powered by Apache Flink.
- Graylog - log management platform for collecting, storing, searching, and alerting on machine data.
- Hevo - managed data pipeline platform for moving data from databases, SaaS apps, cloud storage, SDKs, and streaming services.
- Hightouch - reverse ETL platform for syncing warehouse data into business applications.
- ingestr (⭐3.5k) - CLI tool for copying data between sources and destinations.
- Metricbeat - lightweight shipper for system and service metrics.
Data Quality and Observability
- DataKitchen Open Source Data Observability - open-source data observability for monitoring data journeys, data quality, and pipeline events.
- Great Expectations - open-source framework for validating, documenting, and testing data quality.
- OpenLineage - open standard and reference implementation for collecting lineage metadata from data pipelines.
- Soda Core - open-source Python library and CLI for data quality tests.
Machine Learning
- Aim (⭐6.1k) - open-source AI metadata tracker for experiments and training runs.
- isolation-forest (⭐256) - distributed Spark and Scala implementation of isolation forest for unsupervised outlier detection.
- Neptune - experiment tracking and model registry for research and production machine learning teams.
Benchmarking
- Apache JMeter - load testing tool for measuring performance of services and distributed systems.
Security
- FileShot (⭐26) - zero-knowledge encrypted file transfer for sharing large datasets.
System Deployment
- Terraform - infrastructure as code tool for provisioning and managing cloud and on-premises infrastructure.
Applications
- Gigasheet - cloud spreadsheet for exploring and analyzing large datasets.
Business Intelligence
- Query.me - collaborative SQL notebooks for querying, scheduling, and sharing reporting workflows.
- Datapallas - BI and data platform with AI exploration, dashboards, and pixel-perfect report generation; formerly ReportBurster.
Data Visualization
- Flexmonster Pivot Table & Charts - JavaScript component for pivot tables, charts, and web reporting.
- WebDataRocks - free web pivot table component for embedding analytics in applications.
Books / Streaming
- Fusion in Action - Fusion in Action teaches you to build a full-featured data analytics pipeline, including document and data search and distributed data clustering.
- Data Analysis with Python and PySpark - tutorial for using PySpark to build data-driven applications at scale.
- Data Pipelines with Apache Airflow - practical guide to building and maintaining data pipelines with Airflow.
Data Visualization / Graph Based approach
- Data Annotation and Labeling Tools awesome-open-data-annotation (⭐700).
11. Android Security Awesome
Tools / Reverse Engineering
- Apktool (⭐25k) – really useful for compilation/decompilation (uses smali)
12. Awesome Go
Science and Data Analysis
- matrix (⭐2) - A clean, generic, zero-dependency matrix math package for Go with support for arithmetic, decompositions, and linear system solving.
Testing Frameworks
- go-mutesting (⭐0) - Mutation testing for Go with CI quality gates, coverage-aware MSI, baseline tracking, and git-diff filtering.
Utilities
- Go-Constant (⭐0) - Generic typed constant sets with safe string parsing for Go's missing enum type.
13. Awesome Rtc
Server Software / STUN/TURN
- natcheck (⭐2) - NAT type diagnosis CLI. Probes STUN servers, classifies mapping behaviour per RFC 5780, and reports a WebRTC direct-P2P forecast.
Developer Resources / C/C++ Libraries
- icey (⭐1.4k) - C++20 WebRTC media runtime with FFmpeg pipeline, Symple signalling, and RFC 5766 TURN.
14. Free for Dev
APIs, Data, and ML
- Zip-Codes - REST API for US and Canadian postal codes with address validation, radius search, and Census demographics. 2,500 free requests/day.
Web Hosting
- Mirin - Website platform for developer-built React, Vue, or Svelte component sites with visual editing, forms, analytics, and global CDN hosting. Free tier includes 1 site with unlimited pages and submissions.
15. Awesome Bioie
Data Models / Other Datasets
- unmiri-ngs-fhir-schema (⭐4) - Apache-2.0 JSON Schema (Draft 2020-12) API contract for cross-vendor somatic NGS interpretation output (Foundation Medicine, Tempus, Caris, Guardant), aligned with the HL7 FHIR Genomics IG. A standards-aligned target representation for biomedical information-extraction pipelines that parse oncology lab reports.
16. Awesome Webaudio
Packages / Apps
- Tonalux - Free browser-based audio analysis suite with real-time spectrum analyzer, LUFS loudness metering (EBU R128), A/B reference comparison and stereo correlation. Built with Web Audio API and WebAssembly, runs entirely client-side.
17. Static Analysis
Programming Languages / Other
- flawfinder — Finds possible security weaknesses.
18. Awesome Docker
Observability / Reverse Proxy
- docker-exporter (⭐1) - Lightweight Prometheus exporter for Docker container metrics written in Rust. Correct cgroup v2 memory working set on ARM64 (Raspberry Pi 5), runs non-root with a read-only socket, ~7 MiB idle RAM.
- Middleware - 💴 Monitor Docker hosts, containers, logs, and application performance from a unified observability platform.
Security / Reverse Proxy
- container-explorer (⭐96) - Forensic utility to explore Docker and containerd container details from mounted disk images.
- Next: May 18, 2026