Common_crawl_types Alternatives

A simple Ruby example of how to process Common Crawl files using Elastic MapReduce
Suggest Alternative
Alternatives To petewarden/common_crawl_types
Project Name Stars Downloads Repos Using This Packages Using This Most Recent Commit Total Releases Latest Release Open Issues License Language
commoncrawl/commoncrawl-crawler 208 0 0 over 3 years ago 0 0 gpl-3.0 Java
The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)
rossf7/elasticrawl 50 1 0 about 9 years ago 10 February 15, 2017 1 mit Ruby
Launch AWS Elastic MapReduce jobs that process Common Crawl data.
petewarden/common_crawl_types 28 0 0 about 14 years ago 0 0 Ruby
A simple Ruby example of how to process Common Crawl files using Elastic MapReduce
stormsinbrewing/Real_Time_Social_Media_Mining 24 0 0 over 2 years ago 0 21 mit HTML
DevOps pipeline for Real Time Social/Web Mining
Smerity/cs205_ga 16 0 0 about 12 years ago 0 0 Python
How deep does Google Analytics go? Efficiently tackling Common Crawl using AWS & MapReduce
ly16/GooglePlay-Web-Crawler 15 0 0 about 9 years ago 0 0 Java
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
shlomiv/warc-mapreduce 11 0 0 about 11 years ago 0 0 Java
warc and wet support for Hadoop's mapreduce api
Smerity/cc-mrjob 9 0 0 almost 10 years ago 0 8 mit Python
Demonstration of using Python to process the Common Crawl dataset with the mrjob framework
nglthu/infoRetrieval 8 0 0 about 7 years ago 0 0 mit HTML
Inverted Indexer, web crawler, sort, search and poster steamer written using Python for information retrieval.
openvenues/common_crawl 8 0 0 about 11 years ago 0 0 mit Python
Simple Python MapReduce jobs for processing the Common Crawl plus command-line utilities
Alternatives To petewarden/common_crawl_types
Select To Compare


Alternative Project Comparisons
Popular Mapreduce Projects
Popular Crawler Projects
Popular Data Processing Categories
Related Searches
Get A Weekly Email With Trending Projects
No Spam. Unsubscribe easily at any time.
Privacy | About | Terms | Follow Us On Twitter

Downloads, Dependent Repos, Dependent Packages, Total Releases, Latest Releases data powered by Libraries.io.

Copyright 2018-2026 Awesome Open Source.  All rights reserved.