| aboSamoor/polyglot |
2,212 |
|
65 |
28 |
over 2 years ago |
9 |
December 15, 2021 |
166 |
other |
Python |
| Multilingual text (NLP) processing toolkit |
| HIT-SCIR/ELMoForManyLangs |
1,325 |
|
1 |
1 |
over 5 years ago |
4 |
October 15, 2020 |
|
mit |
Python |
| Pre-trained ELMo Representations for Many Languages |
| MilaNLProc/contextualized-topic-models |
1,141 |
|
0 |
4 |
about 2 years ago |
30 |
November 03, 2022 |
10 |
mit |
Python |
| A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021. |
| bheinzerling/bpemb |
1,068 |
|
15 |
86 |
over 3 years ago |
13 |
September 23, 2022 |
4 |
mit |
Python |
| Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) |
| google-research-datasets/wit |
896 |
|
0 |
0 |
over 2 years ago |
0 |
|
3 |
other |
|
| WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages. |
| unitaryai/detoxify |
774 |
|
0 |
10 |
over 2 years ago |
11 |
December 19, 2022 |
41 |
apache-2.0 |
Python |
| Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai. |
| nlp-uoregon/trankit |
693 |
|
0 |
2 |
over 2 years ago |
20 |
March 26, 2022 |
24 |
apache-2.0 |
Python |
| Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing |
| dccuchile/beto |
462 |
|
0 |
0 |
over 2 years ago |
0 |
|
6 |
cc-by-4.0 |
|
| BETO - Spanish version of the BERT model |
| filyp/autocorrect |
376 |
|
18 |
30 |
over 2 years ago |
27 |
December 04, 2021 |
7 |
lgpl-3.0 |
Python |
| Spelling corrector in python |
| artitw/text2text |
268 |
|
0 |
0 |
about 2 years ago |
134 |
October 21, 2023 |
27 |
other |
Python |
| Text2Text: Crosslingual NLP/G toolkit |