| commoncrawl/commoncrawl-crawler |
208 |
|
0 |
0 |
over 3 years ago |
0 |
|
0 |
gpl-3.0 |
Java |
| The Common Crawl Crawler Engine and Related MapReduce code (2008-2012) |
| rossf7/elasticrawl |
50 |
|
1 |
0 |
about 9 years ago |
10 |
February 15, 2017 |
1 |
mit |
Ruby |
| Launch AWS Elastic MapReduce jobs that process Common Crawl data. |
| petewarden/common_crawl_types |
28 |
|
0 |
0 |
about 14 years ago |
0 |
|
0 |
|
Ruby |
| A simple Ruby example of how to process Common Crawl files using Elastic MapReduce |
| stormsinbrewing/Real_Time_Social_Media_Mining |
24 |
|
0 |
0 |
over 2 years ago |
0 |
|
21 |
mit |
HTML |
| DevOps pipeline for Real Time Social/Web Mining |
| Smerity/cs205_ga |
16 |
|
0 |
0 |
about 12 years ago |
0 |
|
0 |
|
Python |
| How deep does Google Analytics go? Efficiently tackling Common Crawl using AWS & MapReduce |
| ly16/GooglePlay-Web-Crawler |
15 |
|
0 |
0 |
about 9 years ago |
0 |
|
0 |
|
Java |
| Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive |
| shlomiv/warc-mapreduce |
11 |
|
0 |
0 |
about 11 years ago |
0 |
|
0 |
|
Java |
| warc and wet support for Hadoop's mapreduce api |
| Smerity/cc-mrjob |
9 |
|
0 |
0 |
almost 10 years ago |
0 |
|
8 |
mit |
Python |
| Demonstration of using Python to process the Common Crawl dataset with the mrjob framework |
| nglthu/infoRetrieval |
8 |
|
0 |
0 |
about 7 years ago |
0 |
|
0 |
mit |
HTML |
| Inverted Indexer, web crawler, sort, search and poster steamer written using Python for information retrieval. |
| openvenues/common_crawl |
8 |
|
0 |
0 |
about 11 years ago |
0 |
|
0 |
mit |
Python |
| Simple Python MapReduce jobs for processing the Common Crawl plus command-line utilities |