| commoncrawl/commoncrawl |
466 |
|
0 |
0 |
over 8 years ago |
0 |
|
8 |
|
C++ |
| Common Crawl support library to access 2008-2012 crawl archives (ARC files) |
| ogrisel/pignlproc |
160 |
|
0 |
0 |
over 3 years ago |
0 |
|
6 |
|
Java |
| Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps. |
| utcompling/textgrounder |
60 |
|
0 |
0 |
about 10 years ago |
0 |
|
1 |
apache-2.0 |
Scala |
| A system for connecting language to space and time. |
| jhclark/bigfatlm |
30 |
|
0 |
0 |
almost 8 years ago |
0 |
|
3 |
lgpl-3.0 |
Java |
| Hadoop MapReduce training of modified Kneser-Ney smoothed language models |
| peterexner/KOSHIK |
9 |
|
0 |
0 |
over 11 years ago |
0 |
|
0 |
|
Java |
| An NLP framework for large scale processing using Hadoop |
| JulianEberius/dwtc-tools |
8 |
|
0 |
0 |
almost 11 years ago |
0 |
|
1 |
apache-2.0 |
Java |
| Dresden Web Table Corpus Java library |
| scottcrespo/ngrams |
5 |
|
0 |
0 |
almost 11 years ago |
0 |
|
0 |
gpl-3.0 |
Java |
| NGram Map Reduce Algorithms |