| pwxcoo/chinese-xinhua |
10,425 |
|
0 |
0 |
over 2 years ago |
0 |
|
30 |
mit |
Python |
| :orange_book: 中华新华字典数据库。包括歇后语,成语,词语,汉字。 |
| brightmart/nlp_chinese_corpus |
8,344 |
|
0 |
0 |
almost 3 years ago |
0 |
|
20 |
mit |
|
| 大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP |
| crownpku/Awesome-Chinese-NLP |
7,547 |
|
0 |
0 |
over 2 years ago |
0 |
|
3 |
apache-2.0 |
|
| A curated list of resources for Chinese NLP 中文自然语言处理相关资料 |
| HIT-SCIR/ltp |
4,693 |
|
0 |
3 |
over 2 years ago |
46 |
January 02, 2023 |
52 |
|
Python |
| Language Technology Platform |
| IDEA-CCNL/Fengshenbang-LM |
3,670 |
|
0 |
0 |
over 2 years ago |
0 |
|
86 |
apache-2.0 |
Python |
| Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。 |
| baidu/lac |
3,644 |
|
0 |
12 |
almost 5 years ago |
15 |
May 25, 2021 |
87 |
apache-2.0 |
C++ |
| 百度NLP:分词,词性标注,命名实体识别,词重要性 |
| fastnlp/fastNLP |
2,940 |
|
0 |
2 |
almost 3 years ago |
24 |
October 31, 2022 |
62 |
apache-2.0 |
Python |
| fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation. |
| CVI-SZU/Linly |
2,888 |
|
0 |
0 |
over 2 years ago |
0 |
|
107 |
|
Python |
| Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集 |
| esbatmop/MNBVC |
2,533 |
|
0 |
0 |
about 2 years ago |
0 |
|
18 |
mit |
|
| MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。 |
| crownpku/Information-Extraction-Chinese |
2,086 |
|
0 |
0 |
about 3 years ago |
0 |
|
118 |
|
Python |
| Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取 |