Airflow Pyspark Emr Alternatives

This project demonstrate how to process data stored in a data lake fashion, transforming it into an OLAP optimized structure by using PySpark. The PySpark Job runs on AWS EMR, and the Data Pipeline is orchestrated by Apache Airflow, including the infrastructure creation and the EMR cluster termination.
Suggest Alternative
Alternatives To GabrielAmazonas/airflow-pyspark-emr
Project Name Stars Downloads Repos Using This Packages Using This Most Recent Commit Total Releases Latest Release Open Issues License Language
huseinzol05/Gather-Deployment 347 0 0 over 2 years ago 0 0 mit Jupyter Notebook
Gathers Python deployment, infrastructure and practices.
alanchn31/Movalytics-Data-Warehouse 103 0 0 almost 6 years ago 0 0 Python
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
rafaelpierre/pyjaws 36 0 0 over 2 years ago 0 3 mit Python
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
mozilla/python_mozetl 26 0 0 over 2 years ago 0 23 mit Python
ETL jobs for Firefox Telemetry
rayyan17/jobAnalytics_and_search 22 0 0 about 4 years ago 0 8 mit Python
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
mpavanetti/airflow 8 0 0 over 2 years ago 0 0 PHP
This set of code and instructions has the porpouse to instanciate a compiled environment with set of docker images like airflow webserver, airflow scheduler, postgresql, pyspark, Data Pipeline consuming data from weather api , processing with pyspark and storing in postgresql
zacharyt-cs/reddit-data-engineering 7 0 0 over 3 years ago 0 mit Python
An end-to-end data engineering pipeline to create a dashboard for the latest content on the r/Stocks subreddit
camposvinicius/aws-etl 7 0 0 about 4 years ago 0 0 Smarty
This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/data/blob/main/AdventureWorks.zip, it's a zipped file with some .csvs inside that we will apply transformations.
GabrielAmazonas/airflow-pyspark-emr 7 0 0 about 4 years ago 0 7 Python
This project demonstrate how to process data stored in a data lake fashion, transforming it into an OLAP optimized structure by using PySpark. The PySpark Job runs on AWS EMR, and the Data Pipeline is orchestrated by Apache Airflow, including the infrastructure creation and the EMR cluster termination.
achad4/spark-mesos-airflow-tutorial 6 0 0 about 6 years ago 0 2 Python
Alternatives To GabrielAmazonas/airflow-pyspark-emr
Select To Compare


Alternative Project Comparisons
Popular Airflow Projects
Popular Pyspark Projects
Popular Control Flow Categories
Related Searches
Get A Weekly Email With Trending Projects
No Spam. Unsubscribe easily at any time.
Privacy | About | Terms | Follow Us On Twitter

Downloads, Dependent Repos, Dependent Packages, Total Releases, Latest Releases data powered by Libraries.io.

Copyright 2018-2026 Awesome Open Source.  All rights reserved.