Awesome Open Source
Search
Programming Languages
Languages
All Categories
Categories
About
The Top 10 Hadoop Open Source Projects
Open source projects categorized as Hadoop
Categories
>
Data Processing
>
Hadoop
Edit Category
apache/spark
⭐
37,661
Apache Spark - A unified analytics engine for large-scale data processing
dependent packages
0
total releases
0
most recent commit
about 2 years ago
donnemartin/data-science-ipython-notebooks
⭐
25,668
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
dependent packages
0
total releases
0
most recent commit
over 2 years ago
dmlc/xgboost
⭐
25,253
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
dependent packages
0
total releases
0
most recent commit
about 2 years ago
spotify/luigi
⭐
17,046
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
dependent packages
0
total releases
0
most recent commit
about 2 years ago
Tencent/APIJSON
⭐
16,277
🏆 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构。 🏆 A JSON Transmission Protocol and an ORM Library 🚀 provides APIs and Docs without writing any code.
dependent packages
0
total releases
0
most recent commit
about 2 years ago
heibaiying/BigData-Notes
⭐
14,872
大数据入门指南 :star:
dependent packages
0
total releases
0
most recent commit
over 2 years ago
deeplearning4j/deeplearning4j
⭐
13,290
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
dependent packages
0
total releases
0
most recent commit
over 2 years ago
andkret/Cookbook
⭐
12,557
The Data Engineering Cookbook
dependent packages
0
total releases
0
most recent commit
over 2 years ago
apache/doris
⭐
10,666
Apache Doris is an easy-to-use, high performance and unified analytics database.
dependent packages
0
total releases
0
most recent commit
about 2 years ago
trinodb/trino
⭐
9,118
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
dependent packages
0
total releases
0
most recent commit
about 2 years ago
Get A Weekly Email With Trending Hadoop Projects
No Spam. Unsubscribe easily at any time.
Hadoop
Subscribe
Javascript must be enabled to subscribe.
Privacy
|
About
|
Terms
|
Follow Us On Twitter
Copyright 2018-2026 Awesome Open Source. All rights reserved.