| nltk/nltk |
12,699 |
|
10,496 |
2,261 |
about 2 years ago |
59 |
July 20, 2023 |
268 |
apache-2.0 |
Python |
| NLTK Source |
| brightmart/nlp_chinese_corpus |
8,344 |
|
0 |
0 |
almost 3 years ago |
0 |
|
20 |
mit |
|
| 大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP |
| codertimo/BERT-pytorch |
5,605 |
|
1 |
0 |
over 2 years ago |
5 |
October 23, 2018 |
63 |
apache-2.0 |
Python |
| Google AI 2018 BERT pytorch implementation |
| niderhoff/nlp-datasets |
5,235 |
|
0 |
0 |
over 3 years ago |
0 |
|
7 |
|
|
| Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP) |
| Kyubyong/nlp_tasks |
2,904 |
|
0 |
0 |
over 7 years ago |
0 |
|
0 |
apache-2.0 |
|
| Natural Language Processing Tasks and References |
| dbiir/UER-py |
2,802 |
|
0 |
0 |
over 2 years ago |
0 |
|
132 |
apache-2.0 |
Python |
| Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo |
| CLUEbenchmark/CLUEDatasetSearch |
2,778 |
|
0 |
0 |
over 3 years ago |
0 |
|
6 |
|
Python |
| 搜索所有中文NLP数据集,附常用英文NLP数据集 |
| endymecy/awesome-deeplearning-resources |
2,739 |
|
0 |
0 |
about 2 years ago |
0 |
|
2 |
mit |
|
| Deep Learning and deep reinforcement learning research papers and some codes |
| adbar/trafilatura |
2,447 |
|
0 |
66 |
about 2 years ago |
39 |
November 29, 2023 |
66 |
gpl-3.0 |
Python |
| Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments |
| imcaspar/gpt2-ml |
1,674 |
|
0 |
0 |
almost 3 years ago |
0 |
|
22 |
apache-2.0 |
Python |
| GPT2 for Multiple Languages, including pretrained models. GPT2 多语言支持, 15亿参数中文预训练模型 |