Datapipelines Essentials Python Alternatives

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Suggest Alternative
Alternatives To vim89/datapipelines-essentials-python
Project Name Stars Downloads Repos Using This Packages Using This Most Recent Commit Total Releases Latest Release Open Issues License Language
apache/spark 37,661 2,394 939 about 2 years ago 46 May 09, 2021 186 apache-2.0 Scala
Apache Spark - A unified analytics engine for large-scale data processing
donnemartin/data-science-ipython-notebooks 25,668 0 0 over 2 years ago 0 34 other Python
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
heibaiying/BigData-Notes 14,872 0 0 over 2 years ago 0 39 Java
大数据入门指南 :star:
deeplearning4j/deeplearning4j 13,290 175 119 over 2 years ago 54 August 10, 2022 624 apache-2.0 Java
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
andkret/Cookbook 12,557 0 0 over 2 years ago 0 111 apache-2.0
The Data Engineering Cookbook
apache/doris 10,666 0 0 about 2 years ago 8 September 27, 2023 2,332 apache-2.0 Java
Apache Doris is an easy-to-use, high performance and unified analytics database.
XiangLinPro/IT_book 8,543 0 0 over 4 years ago 0 7
本项目收藏这些年来看过或者听过的一些不错的常用的上千本书籍,没准你想找的书就在这里呢,包含了互联网行业大多数书籍和面试经验题目等等。有人工智能系列(常用深度学习框架TensorFlow、pytorch、keras。NLP、机器学习,深度学习等等),大数据系列(Spark,Hadoop,Scala,kafka等),程序员必修系列(C、C++、java、数据结构、linux,设计模式、数据库等等)
wangzhiwubigdata/God-Of-BigData 8,483 0 0 over 2 years ago 0 3
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
h2oai/h2o-3 7,487 62 33 4 days ago 49 August 09, 2023 2,746 apache-2.0 Jupyter Notebook
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Alluxio/alluxio 6,544 31 53 about 2 years ago 73 November 29, 2023 969 apache-2.0 Java
Alluxio, data orchestration for analytics and machine learning in the cloud
Alternatives To vim89/datapipelines-essentials-python
Select To Compare


Alternative Project Comparisons
Popular Spark Projects
Popular Hadoop Projects
Popular Data Processing Categories
Related Searches
Get A Weekly Email With Trending Projects
No Spam. Unsubscribe easily at any time.
Privacy | About | Terms | Follow Us On Twitter

Downloads, Dependent Repos, Dependent Packages, Total Releases, Latest Releases data powered by Libraries.io.

Copyright 2018-2026 Awesome Open Source.  All rights reserved.