| donnemartin/data-science-ipython-notebooks |
25,668 |
|
0 |
0 |
over 2 years ago |
0 |
|
34 |
other |
Python |
| Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines. |
| trinodb/trino |
9,118 |
|
0 |
29 |
about 2 years ago |
83 |
November 30, 2023 |
2,496 |
apache-2.0 |
Java |
| Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io) |
| vaexio/vaex |
8,084 |
|
2 |
29 |
over 2 years ago |
69 |
July 21, 2023 |
508 |
mit |
Python |
| Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀 |
| catboost/catboost |
7,564 |
|
0 |
12 |
about 2 years ago |
20 |
September 19, 2023 |
539 |
apache-2.0 |
Python |
| A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU. |
| h2oai/h2o-3 |
7,487 |
|
62 |
33 |
4 days ago |
49 |
August 09, 2023 |
2,746 |
apache-2.0 |
Jupyter Notebook |
| H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc. |
| pachyderm/pachyderm |
6,035 |
|
0 |
1 |
about 2 years ago |
613 |
December 04, 2023 |
897 |
apache-2.0 |
Go |
| Data-Centric Pipelines and Data Versioning |
| feast-dev/feast |
5,053 |
|
0 |
28 |
about 2 years ago |
116 |
September 07, 2023 |
149 |
apache-2.0 |
Python |
| Feature Store for Machine Learning |
| microsoft/SynapseML |
4,914 |
|
0 |
6 |
about 2 years ago |
12 |
November 27, 2023 |
335 |
mit |
Scala |
| Simple and Distributed Machine Learning |
| alibaba/GraphScope |
3,530 |
|
0 |
1 |
4 months ago |
452 |
December 09, 2023 |
302 |
apache-2.0 |
C++ |
| 🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统 |
| databricks/koalas |
3,291 |
|
1 |
16 |
over 2 years ago |
47 |
October 19, 2021 |
112 |
apache-2.0 |
Python |
| Koalas: pandas API on Apache Spark |