| awslabs/deequ |
3,044 |
|
0 |
6 |
about 2 years ago |
37 |
November 09, 2023 |
141 |
apache-2.0 |
Scala |
| Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets. |
| datastax/spark-cassandra-connector |
1,929 |
|
109 |
22 |
about 2 years ago |
81 |
April 08, 2021 |
25 |
apache-2.0 |
Scala |
| DataStax Connector for Apache Spark to Apache Cassandra |
| uber/petastorm |
1,693 |
|
0 |
8 |
over 2 years ago |
86 |
February 03, 2023 |
174 |
apache-2.0 |
Python |
| Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code. |
| jadianes/spark-py-notebooks |
1,515 |
|
0 |
0 |
about 3 years ago |
0 |
|
9 |
other |
Jupyter Notebook |
| Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks |
| microsoft/Mobius |
940 |
|
6 |
0 |
over 3 years ago |
22 |
January 29, 2017 |
88 |
mit |
C# |
| C# and F# language binding and extensions to Apache Spark |
| jadianes/spark-movie-lens |
757 |
|
0 |
0 |
almost 5 years ago |
0 |
|
10 |
other |
Jupyter Notebook |
| An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset |
| cdapio/cdap |
735 |
|
0 |
56 |
about 2 years ago |
23 |
September 01, 2023 |
98 |
other |
Java |
| An open source framework for building data analytic applications. |
| csuldw/MachineLearning |
684 |
|
0 |
0 |
over 6 years ago |
0 |
|
1 |
|
Python |
| Machine learning resources,including algorithm, paper, dataset, example and so on. |
| achuthasubhash/Complete-Life-Cycle-of-a-Data-Science-Project |
499 |
|
0 |
0 |
over 2 years ago |
0 |
|
4 |
mit |
|
| Complete-Life-Cycle-of-a-Data-Science-Project |
| whylabs/whylogs-java |
179 |
|
0 |
2 |
over 4 years ago |
5 |
November 01, 2020 |
2 |
apache-2.0 |
Java |
| Profile and monitor your ML data pipeline end-to-end |