| uber/petastorm |
1,693 |
|
0 |
8 |
over 2 years ago |
86 |
February 03, 2023 |
174 |
apache-2.0 |
Python |
| Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code. |
| jadianes/spark-py-notebooks |
1,515 |
|
0 |
0 |
about 3 years ago |
0 |
|
9 |
other |
Jupyter Notebook |
| Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks |
| titicaca/spark-iforest |
147 |
|
0 |
0 |
over 5 years ago |
0 |
|
1 |
apache-2.0 |
Scala |
| Isolation Forest on Spark |
| Refefer/Dampr |
101 |
|
0 |
0 |
over 2 years ago |
9 |
July 03, 2019 |
0 |
other |
Python |
| Python Data Processing library |
| kavgan/phrase-at-scale |
84 |
|
0 |
0 |
almost 7 years ago |
0 |
|
2 |
|
Python |
| Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English |
| Spratiher9/SparkDataset |
28 |
|
0 |
0 |
over 3 years ago |
3 |
November 01, 2021 |
0 |
mit |
Jupyter Notebook |
| Instant search for and access to many datasets in Pyspark. |
| isarn/isarn-sketches-spark |
27 |
|
0 |
0 |
over 3 years ago |
19 |
June 20, 2020 |
6 |
apache-2.0 |
Scala |
| Routines and data structures for using isarn-sketches idiomatically in Apache Spark |
| rlilojr/Detecting-Malicious-URL-Machine-Learning |
23 |
|
0 |
0 |
over 7 years ago |
0 |
|
1 |
|
Jupyter Notebook |
| filipyoo/nyc-taxi-analysis |
17 |
|
0 |
0 |
over 8 years ago |
0 |
|
0 |
|
Jupyter Notebook |
| Analyzing 200 GB of NYC taxi dataset. |
| vsmolyakov/pyspark |
15 |
|
0 |
0 |
over 6 years ago |
0 |
|
0 |
mit |
Python |
| spark (scala and python) |