Cross Language Dataset Alternatives

A multilingual, multi-style and multi-granularity dataset for cross-language textual similarity detection
Suggest Alternative
Alternatives To FerreroJeremy/Cross-Language-Dataset
Project Name Stars Downloads Repos Using This Packages Using This Most Recent Commit Total Releases Latest Release Open Issues License Language
EleutherAI/the-pile 1,048 0 0 almost 3 years ago 1 October 17, 2020 19 mit Python
Tomiinek/Multilingual_Text_to_Speech 740 0 0 over 2 years ago 0 1 mit Python
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
mhagiwara/github-typo-corpus 289 0 0 over 6 years ago 0 1 Python
GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors
csebuetnlp/xl-sum 209 0 0 almost 3 years ago 0 0 Python
This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
thammegowda/mtdata 115 0 0 almost 3 years ago 21 November 25, 2022 22 apache-2.0 Python
A tool that locates, downloads, and extracts machine translation corpora
apple/ml-mkqa 94 0 0 almost 4 years ago 0 1 apache-2.0 Python
We introduce MKQA, an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). The goal of this dataset is to provide a challenging benchmark for question answering quality across a wide set of languages. Please refer to our paper for details, MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering
afshinrahimi/mmner 69 0 0 over 4 years ago 0 1 apache-2.0 Python
Massively Multilingual Transfer for NER
notAI-tech/Anuvaad 65 0 0 about 5 years ago 7 April 11, 2021 3 gpl-3.0 Python
State of the art open-source translation for Indic languages.
cisnlp/Glot500 65 0 0 over 2 years ago 0 0 other Python
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages (ACL'23)
project-miracl/miracl 61 0 0 over 2 years ago 0 1 apache-2.0
A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.
Alternatives To FerreroJeremy/Cross-Language-Dataset
Select To Compare


Alternative Project Comparisons
Popular Dataset Projects
Popular Multilingual Projects
Popular Data Processing Categories
Related Searches
Get A Weekly Email With Trending Projects
No Spam. Unsubscribe easily at any time.
Privacy | About | Terms | Follow Us On Twitter

Downloads, Dependent Repos, Dependent Packages, Total Releases, Latest Releases data powered by Libraries.io.

Copyright 2018-2026 Awesome Open Source.  All rights reserved.