| adilkhash/Data-Engineering-HowTo |
2,949 |
|
0 |
0 |
over 2 years ago |
0 |
|
4 |
|
|
| A list of useful resources to learn Data Engineering from scratch |
| anuran-roy/serpytor |
18 |
|
0 |
0 |
about 3 years ago |
0 |
|
5 |
mit |
Python |
| A distributed, low-code, end-to-end data collection and analysis tool for data folks. Take the pain out of data collection from your pipeline! |
| chollinger93/bridgefour |
16 |
|
0 |
0 |
over 2 years ago |
0 |
|
0 |
|
Scala |
| Bridge Four is a simple, functional, effectful, single-leader, multi worker, distributed compute system optimized for embarrassingly parallel workloads. |
| larribas/dagger |
9 |
|
0 |
1 |
about 4 years ago |
11 |
September 30, 2021 |
0 |
apache-2.0 |
Python |
| Define sophisticated data pipelines with Python and run them on different distributed systems (such as Argo Workflows). |
| david-siqi-liu/sparklyclean |
6 |
|
0 |
0 |
over 5 years ago |
0 |
|
0 |
mit |
Scala |
| Optimal distributed data deduplication and supervised learning pipeline using Apache Spark |