| apache/iceberg |
5,179 |
|
0 |
0 |
about 2 years ago |
3 |
October 29, 2022 |
1,485 |
apache-2.0 |
Java |
| Apache Iceberg |
| gchq/Gaffer |
1,724 |
|
4 |
31 |
about 2 years ago |
101 |
November 14, 2023 |
142 |
apache-2.0 |
Java |
| A large-scale entity and relation database supporting aggregation of properties |
| uber/petastorm |
1,693 |
|
0 |
8 |
over 2 years ago |
86 |
February 03, 2023 |
174 |
apache-2.0 |
Python |
| Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code. |
| bigdatagenomics/adam |
966 |
|
20 |
17 |
about 2 years ago |
14 |
December 16, 2020 |
35 |
apache-2.0 |
Scala |
| ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed. |
| HariSekhon/DevOps-Python-tools |
709 |
|
0 |
0 |
over 2 years ago |
0 |
|
37 |
mit |
Python |
| 80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc. |
| SuperCowPowers/zat |
409 |
|
0 |
1 |
about 2 years ago |
11 |
January 26, 2023 |
10 |
mit |
Jupyter Notebook |
| Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark |
| Netflix/iceberg |
409 |
|
0 |
0 |
over 4 years ago |
0 |
|
27 |
apache-2.0 |
Java |
| Iceberg is a table format for large, slow-moving tabular data |
| adobe-research/spindle |
333 |
|
0 |
0 |
about 11 years ago |
0 |
|
2 |
apache-2.0 |
JavaScript |
| Next-generation web analytics processing with Scala, Spark, and Parquet. |
| RumbleDB/rumble |
194 |
|
0 |
0 |
almost 3 years ago |
4 |
December 03, 2019 |
134 |
other |
Java |
| ⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more |
| aiyanbo/spark-programming-guide-zh-cn |
188 |
|
0 |
0 |
about 3 years ago |
0 |
|
0 |
other |
|
| Spark 编程指南简体中文版 |