| moj-analytical-services/splink |
939 |
|
0 |
2 |
about 2 years ago |
119 |
November 14, 2023 |
167 |
mit |
Python |
| Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends |
| zinggAI/zingg |
828 |
|
0 |
0 |
about 2 years ago |
1 |
June 01, 2022 |
76 |
agpl-3.0 |
Java |
| Scalable identity resolution, entity resolution, data mastering and deduplication using ML |
| zouzias/spark-lucenerdd |
127 |
|
0 |
0 |
over 2 years ago |
39 |
June 02, 2021 |
36 |
apache-2.0 |
Scala |
| Spark RDD with Lucene's query and entity linkage capabilities |
| ing-bank/spark-matcher |
27 |
|
0 |
0 |
over 2 years ago |
0 |
|
5 |
gpl-2.0 |
Python |
| Record matching and entity resolution at scale in Spark |
| phymbert/spark-search |
20 |
|
0 |
0 |
about 4 years ago |
8 |
September 26, 2021 |
32 |
apache-2.0 |
Scala |
| Spark Search - high performance advanced search features based on Apache Lucene |
| NYUBigDataProject/SparkClean |
20 |
|
0 |
0 |
about 7 years ago |
0 |
|
0 |
apache-2.0 |
Python |
| A Scalable Data Cleaning Library for PySpark. |
| david-siqi-liu/sparklyclean |
6 |
|
0 |
0 |
over 5 years ago |
0 |
|
0 |
mit |
Scala |
| Optimal distributed data deduplication and supervised learning pipeline using Apache Spark |