| wainshine/Chinese-Names-Corpus |
3,719 |
|
0 |
0 |
over 2 years ago |
0 |
|
7 |
apache-2.0 |
|
| 中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。 |
| ko-ichi-h/khcoder |
295 |
|
0 |
0 |
over 2 years ago |
0 |
|
10 |
gpl-2.0 |
Perl |
| KH Coder: for Quantitative Content Analysis or Text Mining |
| icoxfog417/fastTextJapaneseTutorial |
174 |
|
0 |
0 |
over 9 years ago |
0 |
|
0 |
mit |
Python |
| Tutorial to train fastText with Japanese corpus |
| scriptin/kanji-frequency |
116 |
|
0 |
0 |
about 2 years ago |
0 |
|
1 |
cc-by-4.0 |
Astro |
| Kanji usage frequency data collected from various sources |
| taishi-i/toiro |
110 |
|
0 |
0 |
over 2 years ago |
8 |
July 31, 2023 |
1 |
apache-2.0 |
Python |
| A comparison tool of Japanese tokenizers |
| WorksApplications/chiVe |
105 |
|
0 |
0 |
over 3 years ago |
0 |
|
0 |
apache-2.0 |
|
| Japanese word embedding with Sudachi and NWJC 🌿 |
| jiali-ms/JLM |
99 |
|
0 |
0 |
almost 7 years ago |
0 |
|
0 |
mit |
Python |
| A fast LSTM Language Model for large vocabulary language like Japanese and Chinese |
| Hironsan/ja.text8 |
74 |
|
0 |
0 |
over 8 years ago |
0 |
|
0 |
|
Python |
| Japanese text8 corpus for word embedding. |
| megagonlabs/jrte-corpus |
73 |
|
0 |
0 |
almost 3 years ago |
0 |
|
0 |
other |
Python |
| Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020) |
| ku-nlp/KWDLC |
71 |
|
0 |
0 |
over 2 years ago |
0 |
|
12 |
|
Python |
| Kyoto University Web Document Leads Corpus |