| google/sentencepiece |
8,851 |
|
120 |
787 |
about 2 years ago |
34 |
May 02, 2023 |
32 |
apache-2.0 |
C++ |
| Unsupervised text tokenizer for Neural Network-based text generation. |
| VKCOM/YouTokenToMe |
926 |
|
6 |
17 |
almost 3 years ago |
14 |
February 12, 2020 |
39 |
mit |
C++ |
| Unsupervised text tokenizer focused on computational efficiency |
| PyThaiNLP/pythainlp |
902 |
|
24 |
51 |
about 2 years ago |
101 |
November 26, 2023 |
35 |
apache-2.0 |
Python |
| Thai Natural Language Processing in Python. |
| messense/jieba-rs |
585 |
|
5 |
15 |
over 2 years ago |
40 |
July 16, 2023 |
9 |
mit |
Rust |
| The Jieba Chinese Word Segmentation Implemented in Rust |
| cbaziotis/ekphrasis |
583 |
|
7 |
0 |
over 3 years ago |
54 |
May 17, 2022 |
18 |
mit |
Python |
| Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets). |
| JayYip/m3tl |
543 |
|
0 |
0 |
about 3 years ago |
0 |
|
25 |
apache-2.0 |
Jupyter Notebook |
| BERT for Multitask Learning |
| vncorenlp/VnCoreNLP |
472 |
|
0 |
0 |
about 3 years ago |
0 |
|
0 |
other |
Java |
| A Vietnamese natural language processing toolkit (NAACL 2018) |
| ckiplab/ckip-transformers |
439 |
|
0 |
0 |
about 3 years ago |
0 |
|
1 |
gpl-3.0 |
Python |
| CKIP Transformers |
| taishi-i/nagisa |
365 |
|
1 |
7 |
about 2 years ago |
22 |
July 30, 2023 |
4 |
mit |
Python |
| A Japanese tokenizer based on recurrent neural networks |
| ku-nlp/jumanpp |
334 |
|
0 |
0 |
about 3 years ago |
0 |
|
30 |
apache-2.0 |
C++ |
| Juman++ (a Morphological Analyzer Toolkit) |