Sparklyclean Alternatives

Optimal distributed data deduplication and supervised learning pipeline using Apache Spark
Suggest Alternative
Alternatives To david-siqi-liu/sparklyclean
Project Name Stars Downloads Repos Using This Packages Using This Most Recent Commit Total Releases Latest Release Open Issues License Language
moj-analytical-services/splink 939 0 2 about 2 years ago 119 November 14, 2023 167 mit Python
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
zinggAI/zingg 828 0 0 about 2 years ago 1 June 01, 2022 76 agpl-3.0 Java
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
zouzias/spark-lucenerdd 127 0 0 over 2 years ago 39 June 02, 2021 36 apache-2.0 Scala
Spark RDD with Lucene's query and entity linkage capabilities
ing-bank/spark-matcher 27 0 0 over 2 years ago 0 5 gpl-2.0 Python
Record matching and entity resolution at scale in Spark
phymbert/spark-search 20 0 0 about 4 years ago 8 September 26, 2021 32 apache-2.0 Scala
Spark Search - high performance advanced search features based on Apache Lucene
NYUBigDataProject/SparkClean 20 0 0 about 7 years ago 0 0 apache-2.0 Python
A Scalable Data Cleaning Library for PySpark.
david-siqi-liu/sparklyclean 6 0 0 over 5 years ago 0 0 mit Scala
Optimal distributed data deduplication and supervised learning pipeline using Apache Spark
Alternatives To david-siqi-liu/sparklyclean
Select To Compare


Alternative Project Comparisons
Popular Spark Projects
Popular Deduplication Projects
Popular Data Processing Categories
Related Searches
Get A Weekly Email With Trending Projects
No Spam. Unsubscribe easily at any time.
Privacy | About | Terms | Follow Us On Twitter

Downloads, Dependent Repos, Dependent Packages, Total Releases, Latest Releases data powered by Libraries.io.

Copyright 2018-2026 Awesome Open Source.  All rights reserved.