| EleutherAI/the-pile |
1,048 |
|
0 |
0 |
almost 3 years ago |
1 |
October 17, 2020 |
19 |
mit |
Python |
| Tomiinek/Multilingual_Text_to_Speech |
740 |
|
0 |
0 |
over 2 years ago |
0 |
|
1 |
mit |
Python |
| An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning. |
| mhagiwara/github-typo-corpus |
289 |
|
0 |
0 |
over 6 years ago |
0 |
|
1 |
|
Python |
| GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors |
| csebuetnlp/xl-sum |
209 |
|
0 |
0 |
almost 3 years ago |
0 |
|
0 |
|
Python |
| This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. |
| thammegowda/mtdata |
115 |
|
0 |
0 |
almost 3 years ago |
21 |
November 25, 2022 |
22 |
apache-2.0 |
Python |
| A tool that locates, downloads, and extracts machine translation corpora |
| apple/ml-mkqa |
94 |
|
0 |
0 |
almost 4 years ago |
0 |
|
1 |
apache-2.0 |
Python |
| We introduce MKQA, an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). The goal of this dataset is to provide a challenging benchmark for question answering quality across a wide set of languages. Please refer to our paper for details, MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering |
| afshinrahimi/mmner |
69 |
|
0 |
0 |
over 4 years ago |
0 |
|
1 |
apache-2.0 |
Python |
| Massively Multilingual Transfer for NER |
| notAI-tech/Anuvaad |
65 |
|
0 |
0 |
about 5 years ago |
7 |
April 11, 2021 |
3 |
gpl-3.0 |
Python |
| State of the art open-source translation for Indic languages. |
| cisnlp/Glot500 |
65 |
|
0 |
0 |
over 2 years ago |
0 |
|
0 |
other |
Python |
| Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages (ACL'23) |
| project-miracl/miracl |
61 |
|
0 |
0 |
over 2 years ago |
0 |
|
1 |
apache-2.0 |
|
| A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages. |