| apache/spark |
37,661 |
|
2,394 |
939 |
about 2 years ago |
46 |
May 09, 2021 |
186 |
apache-2.0 |
Scala |
| Apache Spark - A unified analytics engine for large-scale data processing |
| microsoft/SynapseML |
4,914 |
|
0 |
6 |
about 2 years ago |
12 |
November 27, 2023 |
335 |
mit |
Scala |
| Simple and Distributed Machine Learning |
| apache/hudi |
4,901 |
|
0 |
58 |
about 2 years ago |
21 |
November 11, 2023 |
886 |
apache-2.0 |
Java |
| Upserts, Deletes And Incremental Processing on Big Data. |
| intel-analytics/BigDL |
4,728 |
|
0 |
10 |
about 2 years ago |
16 |
April 19, 2021 |
958 |
apache-2.0 |
Jupyter Notebook |
| Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using bigdl-llm |
| JerryLead/SparkInternals |
4,665 |
|
0 |
0 |
over 4 years ago |
0 |
|
27 |
|
|
| Notes talking about the design and implementation of Apache Spark |
| JohnSnowLabs/spark-nlp |
3,578 |
|
0 |
30 |
about 2 years ago |
134 |
December 08, 2023 |
43 |
apache-2.0 |
Scala |
| State of the Art Natural Language Processing |
| lw-lin/CoolplaySpark |
3,430 |
|
0 |
0 |
almost 4 years ago |
0 |
|
35 |
|
Scala |
| 酷玩 Spark: Spark 源代码解析、Spark 类库等 |
| databricks/koalas |
3,291 |
|
1 |
16 |
over 2 years ago |
47 |
October 19, 2021 |
112 |
apache-2.0 |
Python |
| Koalas: pandas API on Apache Spark |
| spark-notebook/spark-notebook |
3,147 |
|
0 |
0 |
almost 3 years ago |
0 |
|
207 |
apache-2.0 |
JavaScript |
| Interactive and Reactive Data Science using Scala and Spark. |
| awslabs/deequ |
3,044 |
|
0 |
6 |
about 2 years ago |
37 |
November 09, 2023 |
141 |
apache-2.0 |
Scala |
| Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets. |