| brightmart/nlp_chinese_corpus |
8,344 |
|
0 |
0 |
almost 3 years ago |
0 |
|
20 |
mit |
|
| 大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP |
| wainshine/Chinese-Names-Corpus |
3,719 |
|
0 |
0 |
over 2 years ago |
0 |
|
7 |
apache-2.0 |
|
| 中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。 |
| CLUEbenchmark/CLUE |
3,345 |
|
0 |
0 |
almost 3 years ago |
0 |
|
73 |
|
Python |
| 中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard |
| CLUEbenchmark/CLUEDatasetSearch |
2,778 |
|
0 |
0 |
over 3 years ago |
0 |
|
6 |
|
Python |
| 搜索所有中文NLP数据集,附常用英文NLP数据集 |
| candlewill/Dialog_Corpus |
1,487 |
|
0 |
0 |
over 5 years ago |
0 |
|
2 |
|
Python |
| 用于训练中英文对话系统的语料库 Datasets for Training Chatbot System |
| juand-r/entity-recognition-datasets |
1,386 |
|
0 |
0 |
over 2 years ago |
0 |
|
7 |
mit |
Python |
| A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types. |
| wainshine/Company-Names-Corpus |
1,106 |
|
0 |
0 |
over 3 years ago |
0 |
|
3 |
apache-2.0 |
|
| 公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。 |
| chatopera/insuranceqa-corpus-zh |
983 |
|
0 |
0 |
over 2 years ago |
11 |
November 15, 2023 |
9 |
other |
Python |
| :helicopter: 保险行业语料库,聊天机器人 |
| thu-coai/CDial-GPT |
944 |
|
0 |
0 |
almost 4 years ago |
0 |
|
10 |
mit |
Python |
| A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models |
| karthikncode/nlp-datasets |
871 |
|
0 |
0 |
over 6 years ago |
0 |
|
6 |
|
|
| A list of datasets/corpora for NLP tasks, in reverse chronological order. |