| jflex-de/jflex |
523 |
|
194 |
64 |
almost 3 years ago |
14 |
March 11, 2023 |
25 |
other |
Java |
| The fast scanner generator for Java™ with full Unicode support |
| OpenNMT/Tokenizer |
224 |
|
15 |
5 |
over 2 years ago |
68 |
January 11, 2023 |
2 |
mit |
C++ |
| Fast and customizable text tokenization library with BPE and SentencePiece support |
| clipperhouse/uax29 |
35 |
|
0 |
6 |
over 2 years ago |
40 |
May 26, 2023 |
1 |
mit |
Go |
| A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split words, sentences and graphemes. |
| jbowles/nlpt |
35 |
|
0 |
0 |
about 10 years ago |
1 |
March 05, 2016 |
0 |
other |
Go |
| Natural Language Processing Toolkit written in Go (DEPRECATED see individual packages prefixed nlpt-) |
| illarionov/sqlite3-unicodesn |
29 |
|
0 |
0 |
about 7 years ago |
0 |
|
5 |
|
C |
| SQLite unicode full-text-search tokenizer with Snowball stemming |
| bramstein/unicode-tokenizer |
20 |
|
1 |
4 |
over 12 years ago |
5 |
September 15, 2012 |
0 |
|
JavaScript |
| Unicode Tokenizer following the Unicode Line Breaking algorithm |
| kaunghtetsan275/pyidaungsu |
19 |
|
0 |
0 |
over 3 years ago |
0 |
|
0 |
mit |
Python |
| Python library for Myanmar language |
| dustalov/greeb |
16 |
|
2 |
0 |
almost 7 years ago |
26 |
January 14, 2015 |
0 |
mit |
Ruby |
| Greeb is a simple Unicode-aware regexp-based tokenizer. |
| liuzl/tokenizer |
11 |
|
0 |
0 |
over 7 years ago |
1 |
November 28, 2018 |
0 |
apache-2.0 |
Go |
| Natural Language Tokenizer |
| michaelnmmeyer/mascara |
6 |
|
0 |
0 |
about 9 years ago |
0 |
|
0 |
bsd-3-clause |
C |
| A natural language tokenizer |