| idio/wiki2vec |
587 |
|
0 |
0 |
over 8 years ago |
0 |
|
21 |
|
Java |
| Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby |
| RaRe-Technologies/gensim-data |
492 |
|
0 |
0 |
about 8 years ago |
0 |
|
14 |
lgpl-2.1 |
Python |
| Data repository for pretrained NLP models and NLP corpora. |
| dbpedia/fact-extractor |
413 |
|
0 |
0 |
almost 10 years ago |
0 |
|
7 |
|
Python |
| Fact Extraction from Wikipedia Text |
| informagi/REL |
279 |
|
0 |
0 |
over 2 years ago |
1 |
December 12, 2022 |
12 |
mit |
Python |
| REL: Radboud Entity Linker |
| markriedl/WikiPlots |
234 |
|
0 |
0 |
over 8 years ago |
0 |
|
5 |
|
Python |
| A dataset containing story plots from Wikipedia (books, movies, etc.) and the code for the extractor. |
| icoxfog417/fastTextJapaneseTutorial |
174 |
|
0 |
0 |
over 9 years ago |
0 |
|
0 |
mit |
Python |
| Tutorial to train fastText with Japanese corpus |
| yohasebe/wp2txt |
160 |
|
0 |
0 |
almost 3 years ago |
29 |
May 13, 2023 |
1 |
mit |
Ruby |
| A command-line toolkit to extract text content and category data from Wikipedia dump files |
| ogrisel/pignlproc |
160 |
|
0 |
0 |
over 3 years ago |
0 |
|
6 |
|
Java |
| Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps. |
| rspeer/wiki2text |
129 |
|
0 |
0 |
over 7 years ago |
0 |
June 30, 2015 |
2 |
mit |
Nim |
| Extract a plain text corpus from MediaWiki XML dumps, such as Wikipedia. |
| scriptin/kanji-frequency |
116 |
|
0 |
0 |
about 2 years ago |
0 |
|
1 |
cc-by-4.0 |
Astro |
| Kanji usage frequency data collected from various sources |