Wiki Dump Reader Alternatives

Extract corpora from Wikipedia dumps
Suggest Alternative
Alternatives To CyberZHG/wiki-dump-reader
Project Name Stars Downloads Repos Using This Packages Using This Most Recent Commit Total Releases Latest Release Open Issues License Language
idio/wiki2vec 587 0 0 over 8 years ago 0 21 Java
Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby
RaRe-Technologies/gensim-data 492 0 0 about 8 years ago 0 14 lgpl-2.1 Python
Data repository for pretrained NLP models and NLP corpora.
dbpedia/fact-extractor 413 0 0 almost 10 years ago 0 7 Python
Fact Extraction from Wikipedia Text
informagi/REL 279 0 0 over 2 years ago 1 December 12, 2022 12 mit Python
REL: Radboud Entity Linker
markriedl/WikiPlots 234 0 0 over 8 years ago 0 5 Python
A dataset containing story plots from Wikipedia (books, movies, etc.) and the code for the extractor.
icoxfog417/fastTextJapaneseTutorial 174 0 0 over 9 years ago 0 0 mit Python
Tutorial to train fastText with Japanese corpus
yohasebe/wp2txt 160 0 0 almost 3 years ago 29 May 13, 2023 1 mit Ruby
A command-line toolkit to extract text content and category data from Wikipedia dump files
ogrisel/pignlproc 160 0 0 over 3 years ago 0 6 Java
Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.
rspeer/wiki2text 129 0 0 over 7 years ago 0 June 30, 2015 2 mit Nim
Extract a plain text corpus from MediaWiki XML dumps, such as Wikipedia.
scriptin/kanji-frequency 116 0 0 about 2 years ago 0 1 cc-by-4.0 Astro
Kanji usage frequency data collected from various sources
Alternatives To CyberZHG/wiki-dump-reader
Select To Compare


Alternative Project Comparisons
Popular Wikipedia Projects
Popular Corpus Projects
Popular Companies Categories
Related Searches
Get A Weekly Email With Trending Projects
No Spam. Unsubscribe easily at any time.
Privacy | About | Terms | Follow Us On Twitter

Downloads, Dependent Repos, Dependent Packages, Total Releases, Latest Releases data powered by Libraries.io.

Copyright 2018-2026 Awesome Open Source.  All rights reserved.