| google/sentencepiece |
8,851 |
|
120 |
787 |
about 2 years ago |
34 |
May 02, 2023 |
32 |
apache-2.0 |
C++ |
| Unsupervised text tokenizer for Neural Network-based text generation. |
| huggingface/tokenizers |
8,056 |
|
0 |
362 |
about 2 years ago |
85 |
November 14, 2023 |
233 |
apache-2.0 |
Rust |
| 💥 Fast State-of-the-Art Tokenizers optimized for Research and Production |
| Morizeyao/GPT2-Chinese |
7,249 |
|
0 |
0 |
over 2 years ago |
0 |
|
105 |
mit |
Python |
| Chinese version of GPT2 training code, using BERT tokenizer. |
| roshan-research/hazm |
1,381 |
|
17 |
13 |
4 months ago |
20 |
October 01, 2023 |
12 |
mit |
Python |
| Persian NLP Toolkit |
| natasha/natasha |
1,085 |
|
3 |
9 |
over 2 years ago |
19 |
July 24, 2023 |
24 |
mit |
Python |
| Solves basic Russian NLP tasks, API for lower level Natasha projects |
| SKTBrain/KoBERT |
1,035 |
|
0 |
0 |
about 3 years ago |
0 |
|
5 |
apache-2.0 |
Jupyter Notebook |
| Korean BERT pre-trained cased (KoBERT) |
| arbox/nlp-with-ruby |
1,002 |
|
0 |
0 |
almost 3 years ago |
0 |
|
5 |
cc0-1.0 |
Ruby |
| Curated List: Practical Natural Language Processing done in Ruby |
| lovit/soynlp |
801 |
|
4 |
9 |
over 3 years ago |
33 |
August 25, 2019 |
54 |
other |
Python |
| 한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다. |
| cbaziotis/ekphrasis |
583 |
|
7 |
0 |
over 3 years ago |
54 |
May 17, 2022 |
18 |
mit |
Python |
| Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets). |
| open-korean-text/open-korean-text |
552 |
|
6 |
6 |
about 3 years ago |
14 |
August 07, 2018 |
13 |
apache-2.0 |
Scala |
| Open Korean Text Processor - An Open-source Korean Text Processor |