| ikawaha/kagome |
769 |
|
0 |
27 |
about 2 years ago |
74 |
September 27, 2023 |
4 |
mit |
Go |
| Self-contained Japanese Morphological Analyzer written in pure Go |
| yoheikikuta/bert-japanese |
415 |
|
0 |
0 |
about 5 years ago |
0 |
|
0 |
apache-2.0 |
Jupyter Notebook |
| BERT with SentencePiece for Japanese text. |
| taishi-i/nagisa |
365 |
|
1 |
7 |
about 2 years ago |
22 |
July 30, 2023 |
4 |
mit |
Python |
| A Japanese tokenizer based on recurrent neural networks |
| polm/fugashi |
339 |
|
0 |
39 |
over 2 years ago |
67 |
August 25, 2023 |
5 |
mit |
C++ |
| A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis. |
| ku-nlp/jumanpp |
334 |
|
0 |
0 |
about 3 years ago |
0 |
|
30 |
apache-2.0 |
C++ |
| Juman++ (a Morphological Analyzer Toolkit) |
| WorksApplications/SudachiPy |
318 |
|
0 |
0 |
over 3 years ago |
0 |
|
18 |
apache-2.0 |
Python |
| Python version of Sudachi, a Japanese tokenizer. |
| daac-tools/vibrato |
275 |
|
0 |
1 |
over 2 years ago |
11 |
May 12, 2023 |
3 |
apache-2.0 |
Rust |
| 🎤 vibrato: Viterbi-based accelerated tokenizer |
| daac-tools/vaporetto |
206 |
|
0 |
3 |
over 2 years ago |
16 |
April 01, 2023 |
0 |
apache-2.0 |
Rust |
| 🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer |
| himkt/konoha |
200 |
|
0 |
1 |
about 2 years ago |
10 |
August 03, 2022 |
0 |
mit |
Python |
| 🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code. |
| taishi-i/toiro |
110 |
|
0 |
0 |
over 2 years ago |
8 |
July 31, 2023 |
1 |
apache-2.0 |
Python |
| A comparison tool of Japanese tokenizers |