| apache/iceberg |
5,179 |
|
0 |
0 |
about 2 years ago |
3 |
October 29, 2022 |
1,485 |
apache-2.0 |
Java |
| Apache Iceberg |
| multiprocessio/dsq |
3,401 |
|
0 |
0 |
over 2 years ago |
2 |
October 20, 2022 |
19 |
other |
Go |
| Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more. |
| roapi/roapi |
2,969 |
|
0 |
0 |
over 2 years ago |
17 |
March 20, 2022 |
37 |
apache-2.0 |
Rust |
| Create full-fledged APIs for slowly moving datasets without writing a single line of code. |
| apache/parquet-mr |
2,296 |
|
259 |
208 |
about 2 years ago |
17 |
May 12, 2023 |
133 |
apache-2.0 |
Java |
| Apache Parquet |
| jqnatividad/qsv |
2,079 |
|
0 |
0 |
about 2 years ago |
148 |
November 20, 2023 |
21 |
unlicense |
Rust |
| CSVs sliced, diced & analyzed. |
| apache/drill |
1,856 |
|
23 |
16 |
about 2 years ago |
24 |
April 19, 2023 |
100 |
apache-2.0 |
Java |
| Apache Drill is a distributed MPP query layer for self describing data |
| influxdata/influxdb_iox |
1,805 |
|
0 |
0 |
over 2 years ago |
4 |
March 16, 2023 |
494 |
apache-2.0 |
Rust |
| Pronounced (influxdb eye-ox), short for iron oxide. This is the new core of InfluxDB written in Rust on top of Apache Arrow. |
| gchq/Gaffer |
1,724 |
|
4 |
31 |
about 2 years ago |
101 |
November 14, 2023 |
142 |
apache-2.0 |
Java |
| A large-scale entity and relation database supporting aggregation of properties |
| uber/petastorm |
1,693 |
|
0 |
8 |
over 2 years ago |
86 |
February 03, 2023 |
174 |
apache-2.0 |
Python |
| Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code. |
| apache/parquet-format |
1,559 |
|
38 |
15 |
about 2 years ago |
9 |
November 16, 2023 |
14 |
apache-2.0 |
Java |
| Apache Parquet |