| AlexIoannides/pyspark-example-project |
1,034 |
|
0 |
0 |
over 3 years ago |
0 |
|
11 |
|
Python |
| Example project implementing best practices for PySpark ETL jobs and applications. |
| quintoandar/butterfree |
269 |
|
0 |
1 |
over 2 years ago |
35 |
November 14, 2023 |
6 |
apache-2.0 |
Python |
| A tool for building feature stores. |
| martandsingh/ApacheSpark |
59 |
|
0 |
0 |
over 3 years ago |
0 |
|
0 |
|
Python |
| This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies. |
| vim89/datapipelines-essentials-python |
45 |
|
0 |
0 |
almost 3 years ago |
0 |
|
1 |
apache-2.0 |
Python |
| Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations |
| basin-etl/basin |
29 |
|
0 |
0 |
over 3 years ago |
0 |
|
42 |
other |
TypeScript |
| Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser |
| mozilla/python_mozetl |
26 |
|
0 |
0 |
over 2 years ago |
0 |
|
23 |
mit |
Python |
| ETL jobs for Firefox Telemetry |
| guidok91/spark-movies-etl |
21 |
|
0 |
0 |
over 2 years ago |
0 |
|
2 |
|
Python |
| Spark data pipeline that ingests and transforms movie ratings data. |
| ksbg/sparklanes |
16 |
|
1 |
0 |
about 6 years ago |
5 |
January 31, 2019 |
2 |
mit |
Python |
| A lightweight data processing framework for Apache Spark |
| datayoga-io/lineage |
14 |
|
0 |
2 |
about 4 years ago |
11 |
January 26, 2022 |
0 |
apache-2.0 |
TypeScript |
| Generate beautiful documentation for your data pipelines in markdown format |
| telia-oss/birgitta |
12 |
|
0 |
0 |
about 3 years ago |
34 |
September 10, 2020 |
20 |
mit |
Python |
| Birgitta is a Python ETL test and schema framework, providing automated tests for pyspark notebooks/recipes. |