| donnemartin/data-science-ipython-notebooks |
25,668 |
|
0 |
0 |
over 2 years ago |
0 |
|
34 |
other |
Python |
| Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines. |
| FavioVazquez/ds-cheatsheets |
11,535 |
|
0 |
0 |
over 3 years ago |
0 |
|
7 |
mit |
|
| List of Data Science Cheatsheets to rule the world |
| dagster-io/dagster |
9,467 |
|
2 |
133 |
about 2 years ago |
585 |
December 07, 2023 |
2,343 |
apache-2.0 |
Python |
| An orchestration platform for the development, production, and observation of data assets. |
| h2oai/h2o-3 |
7,487 |
|
62 |
33 |
4 days ago |
49 |
August 09, 2023 |
2,746 |
apache-2.0 |
Jupyter Notebook |
| H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc. |
| mage-ai/mage-ai |
6,324 |
|
0 |
0 |
about 2 years ago |
314 |
December 06, 2023 |
189 |
apache-2.0 |
Python |
| 🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data. |
| microsoft/SynapseML |
4,914 |
|
0 |
6 |
about 2 years ago |
12 |
November 27, 2023 |
335 |
mit |
Scala |
| Simple and Distributed Machine Learning |
| databricks/koalas |
3,291 |
|
1 |
16 |
over 2 years ago |
47 |
October 19, 2021 |
112 |
apache-2.0 |
Python |
| Koalas: pandas API on Apache Spark |
| spark-notebook/spark-notebook |
3,147 |
|
0 |
0 |
almost 3 years ago |
0 |
|
207 |
apache-2.0 |
JavaScript |
| Interactive and Reactive Data Science using Scala and Spark. |
| szilard/benchm-ml |
1,839 |
|
0 |
0 |
over 3 years ago |
0 |
|
11 |
mit |
R |
| A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.). |
| hi-primus/optimus |
1,540 |
|
0 |
0 |
over 1 year ago |
32 |
June 19, 2022 |
29 |
apache-2.0 |
Python |
| :truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark |