| brightmart/nlp_chinese_corpus |
8,344 |
|
0 |
0 |
almost 3 years ago |
0 |
|
20 |
mit |
|
| 大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP |
| lonePatient/awesome-pretrained-chinese-nlp-models |
3,738 |
|
0 |
0 |
about 2 years ago |
0 |
|
1 |
mit |
Python |
| Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合 |
| wainshine/Chinese-Names-Corpus |
3,719 |
|
0 |
0 |
over 2 years ago |
0 |
|
7 |
apache-2.0 |
|
| 中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。 |
| CLUEbenchmark/CLUE |
3,345 |
|
0 |
0 |
almost 3 years ago |
0 |
|
73 |
|
Python |
| 中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard |
| Belval/TextRecognitionDataGenerator |
2,901 |
|
0 |
0 |
over 2 years ago |
12 |
August 02, 2022 |
134 |
mit |
Python |
| A synthetic data generator for text recognition |
| CLUEbenchmark/CLUEDatasetSearch |
2,778 |
|
0 |
0 |
over 3 years ago |
0 |
|
6 |
|
Python |
| 搜索所有中文NLP数据集,附常用英文NLP数据集 |
| GanjinZero/awesome_Chinese_medical_NLP |
1,847 |
|
0 |
0 |
about 2 years ago |
0 |
|
1 |
|
|
| 中文医学NLP公开资源整理:术语集/语料库/词向量/预训练模型/知识图谱/命名实体识别/QA/信息抽取/模型/论文/etc |
| CLUEbenchmark/CLUENER2020 |
1,359 |
|
0 |
0 |
over 3 years ago |
0 |
|
48 |
|
Python |
| CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition |
| didi/ChineseNLP |
1,329 |
|
0 |
0 |
over 4 years ago |
0 |
|
3 |
|
HTML |
| Datasets, SOTA results of every fields of Chinese NLP |
| alibaba/data-juicer |
994 |
|
0 |
0 |
about 2 years ago |
3 |
September 28, 2023 |
16 |
apache-2.0 |
Python |
| A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据! |