| brightmart/nlp_chinese_corpus |
8,344 |
|
0 |
0 |
almost 3 years ago |
0 |
|
20 |
mit |
|
| 大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP |
| UCDenver-ccp/CRAFT |
58 |
|
0 |
0 |
over 3 years ago |
0 |
|
1 |
other |
Clojure |
| dav009/abacus |
42 |
|
0 |
0 |
over 8 years ago |
0 |
May 24, 2021 |
0 |
|
Go |
| Counter Data structure for Golang using CountMin Sketch with a fixed amount of memory |
| lzhenboy/word2vec-Chinese |
34 |
|
0 |
0 |
over 6 years ago |
0 |
|
1 |
|
Python |
| a tutorial for training Chinese-word2vec using Wiki corpus |
| insikk/namu_wiki_db_preprocess |
22 |
|
0 |
0 |
almost 9 years ago |
0 |
|
0 |
apache-2.0 |
Jupyter Notebook |
| A python script to convert namu wiki database to huge Korean language corpus |
| JiaLiangShen/Chinese-Article-Classification-based-on-own-corpus-via-TextCNN-and-GBDT |
16 |
|
0 |
0 |
almost 8 years ago |
0 |
|
1 |
|
Python |
| 中文文本分类,包含了语料库的基本处理,Wiki_zh的处理等 |
| mmcctt00/SpanishTransformerXL |
12 |
|
0 |
0 |
over 6 years ago |
0 |
|
0 |
|
Jupyter Notebook |
| Language model trained on wiki corpus (500M tokens) with fastai v1 acc>42.3% len(vocab)=60K |
| uma-pi1/OPIEC |
12 |
|
0 |
0 |
almost 7 years ago |
0 |
|
0 |
gpl-3.0 |
Java |
| Reading the data from OPIEC - an Open Information Extraction corpus |
| CyberZHG/wiki-dump-reader |
10 |
|
0 |
1 |
about 7 years ago |
4 |
February 01, 2019 |
2 |
mit |
Python |
| Extract corpora from Wikipedia dumps |
| zhouhoo/wiki_zh_vec |
7 |
|
0 |
0 |
over 9 years ago |
0 |
|
0 |
apache-2.0 |
Python |
| a python autotool for train Chinese wiki corpus to word embeddings using word2vec ,glove and lexvec. |