| WikiTeam/wikiteam |
661 |
|
0 |
0 |
over 2 years ago |
0 |
|
159 |
gpl-3.0 |
Python |
| Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2023, WikiTeam has preserved more than 350,000 wikis. |
| bwbaugh/wikipedia-extractor |
247 |
|
0 |
0 |
over 9 years ago |
0 |
|
1 |
|
Python |
| This is a mirror of the script by Giuseppe Attardi, and contains history before the official repo started: https://github.com/attardi/wikiextractor --- Extracts and cleans text from Wikipedia database dump and stores output in a number of files of similar size in a given directory. |
| diegoceccarelli/json-wikipedia |
244 |
|
0 |
0 |
over 4 years ago |
0 |
|
6 |
apache-2.0 |
Java |
| Json Wikipedia, contains code to convert the Wikipedia xml dump into a json/avro dump |
| spencermountain/dumpster-dive |
214 |
|
1 |
2 |
almost 3 years ago |
34 |
July 04, 2023 |
8 |
other |
JavaScript |
| roll a wikipedia dump into mongo |
| dps/go-xml-parse |
117 |
|
0 |
0 |
almost 11 years ago |
0 |
November 26, 2023 |
2 |
|
Go |
| Streaming XML parser example in go |
| jodaiber/Annotated-WikiExtractor |
88 |
|
0 |
0 |
about 15 years ago |
0 |
|
0 |
gpl-3.0 |
Python |
| Simple Wikipedia plain text extractor with article link annotations and Hadoop support. |
| ScalaWilliam/xs4s |
50 |
|
0 |
1 |
over 4 years ago |
6 |
July 27, 2021 |
1 |
other |
Scala |
| XML Streaming for Scala including FS2/cats support |
| saffsd/wikidump |
41 |
|
0 |
0 |
almost 13 years ago |
4 |
April 10, 2013 |
4 |
gpl-3.0 |
Python |
| Tools to manipulate and extract data from wikipedia dumps |
| rdmpage/wikihistoryflow |
39 |
|
0 |
0 |
over 9 years ago |
0 |
|
1 |
|
PHP |
| Visualise Wikipedia page edits using History Flow |
| marcusklang/wikiforia |
31 |
|
0 |
0 |
about 9 years ago |
0 |
|
9 |
gpl-2.0 |
Java |
| A Utility Library for Wikipedia dumps |