| nltk/nltk |
12,699 |
|
10,496 |
2,261 |
about 2 years ago |
59 |
July 20, 2023 |
268 |
apache-2.0 |
Python |
| NLTK Source |
| brightmart/nlp_chinese_corpus |
8,344 |
|
0 |
0 |
almost 3 years ago |
0 |
|
20 |
mit |
|
| 大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP |
| nl8590687/ASRT_SpeechRecognition |
7,253 |
|
0 |
0 |
about 2 years ago |
1 |
October 23, 2020 |
101 |
gpl-3.0 |
Python |
| A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统 |
| stanfordnlp/GloVe |
6,480 |
|
0 |
0 |
over 2 years ago |
0 |
|
80 |
apache-2.0 |
C |
| Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings |
| codertimo/BERT-pytorch |
5,605 |
|
1 |
0 |
over 2 years ago |
5 |
October 23, 2018 |
63 |
apache-2.0 |
Python |
| Google AI 2018 BERT pytorch implementation |
| ibab/tensorflow-wavenet |
5,362 |
|
0 |
0 |
almost 3 years ago |
0 |
|
176 |
mit |
Python |
| A TensorFlow implementation of DeepMind's WaveNet paper |
| niderhoff/nlp-datasets |
5,235 |
|
0 |
0 |
over 3 years ago |
0 |
|
7 |
|
|
| Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP) |
| vespa-engine/vespa |
5,115 |
|
5 |
58 |
about 2 years ago |
741 |
November 30, 2023 |
175 |
apache-2.0 |
Java |
| AI + Data, online. https://vespa.ai |
| shibing624/pycorrector |
4,928 |
|
0 |
1 |
about 2 years ago |
30 |
November 07, 2023 |
27 |
apache-2.0 |
Python |
| pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,LLaMA等模型应用在纠错场景,开箱即用。 |
| dariusk/corpora |
4,757 |
|
0 |
2 |
over 2 years ago |
1 |
May 17, 2018 |
15 |
|
JavaScript |
| A collection of small corpuses of interesting data for the creation of bots and similar stuff. |