| apache/spark |
37,661 |
|
2,394 |
939 |
about 2 years ago |
46 |
May 09, 2021 |
186 |
apache-2.0 |
Scala |
| Apache Spark - A unified analytics engine for large-scale data processing |
| donnemartin/data-science-ipython-notebooks |
25,668 |
|
0 |
0 |
over 2 years ago |
0 |
|
34 |
other |
Python |
| Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines. |
| heibaiying/BigData-Notes |
14,872 |
|
0 |
0 |
over 2 years ago |
0 |
|
39 |
|
Java |
| 大数据入门指南 :star: |
| andkret/Cookbook |
12,557 |
|
0 |
0 |
over 2 years ago |
0 |
|
111 |
apache-2.0 |
|
| The Data Engineering Cookbook |
| wangzhiwubigdata/God-Of-BigData |
8,483 |
|
0 |
0 |
over 2 years ago |
0 |
|
3 |
|
|
| 专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive... |
| h2oai/h2o-3 |
7,487 |
|
62 |
33 |
4 days ago |
49 |
August 09, 2023 |
2,746 |
apache-2.0 |
Jupyter Notebook |
| H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc. |
| delta-io/delta |
6,656 |
|
0 |
45 |
about 2 years ago |
24 |
May 24, 2023 |
601 |
apache-2.0 |
HTML |
| An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs |
| apache/zeppelin |
6,229 |
|
32 |
31 |
about 2 years ago |
2 |
June 21, 2017 |
160 |
apache-2.0 |
Java |
| Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more. |
| risingwavelabs/risingwave |
5,799 |
|
0 |
0 |
about 2 years ago |
14 |
December 07, 2023 |
1,010 |
apache-2.0 |
Rust |
| The distributed streaming database. Engineered to offer the simplest and most cost-efficient way for stream processing and management. |
| microsoft/SynapseML |
4,914 |
|
0 |
6 |
about 2 years ago |
12 |
November 27, 2023 |
335 |
mit |
Scala |
| Simple and Distributed Machine Learning |