| san089/goodreads_etl_pipeline |
593 |
|
0 |
0 |
about 6 years ago |
0 |
|
0 |
mit |
Python |
| An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform. |
| svenkreiss/pysparkling |
253 |
|
7 |
1 |
over 3 years ago |
69 |
November 13, 2022 |
9 |
other |
Python |
| A pure Python implementation of Apache Spark's RDD and DStream interfaces. |
| RumbleDB/rumble |
194 |
|
0 |
0 |
almost 3 years ago |
4 |
December 03, 2019 |
134 |
other |
Java |
| ⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more |
| geotrellis/geotrellis-chatta-demo |
44 |
|
0 |
0 |
over 7 years ago |
0 |
|
11 |
|
JavaScript |
| Demo of GeoTrellis - weighted overlay and zonal summary for University of Tennessee at Chattanooga. |
| tharwaninitin/etlflow |
43 |
|
0 |
11 |
over 2 years ago |
37 |
July 19, 2023 |
0 |
apache-2.0 |
Scala |
| EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more. |
| nareshk1290/Udacity-Data-Engineering |
42 |
|
0 |
0 |
about 6 years ago |
0 |
|
1 |
|
Jupyter Notebook |
| Udacity Data Engineering Nano Degree (DEND) |
| yamrcraft/etl-light |
38 |
|
0 |
0 |
almost 9 years ago |
0 |
|
0 |
mit |
Scala |
| A light Kafka to HDFS/S3 ETL library based on Apache Spark |
| rayyan17/jobAnalytics_and_search |
22 |
|
0 |
0 |
about 4 years ago |
0 |
|
8 |
mit |
Python |
| JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters. |
| hortonworks-spark/cloud-integration |
21 |
|
0 |
0 |
about 3 years ago |
0 |
|
4 |
apache-2.0 |
Scala |
| Spark cloud integration: tests, cloud committers and more |
| guidok91/spark-movies-etl |
21 |
|
0 |
0 |
over 2 years ago |
0 |
|
2 |
|
Python |
| Spark data pipeline that ingests and transforms movie ratings data. |