Chronicrawl Alternatives

Experimental continouous web crawler for web archiving
Suggest Alternative
Alternatives To nla/chronicrawl
Project Name Stars Downloads Repos Using This Packages Using This Most Recent Commit Total Releases Latest Release Open Issues License Language
internetarchive/heritrix3 2,579 0 2 over 2 years ago 9 July 27, 2022 48 other Java
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
iipc/awesome-web-archiving 1,669 0 0 about 2 years ago 0 3 cc0-1.0
An Awesome List for getting started with web archiving
ArchiveTeam/grab-site 1,121 0 0 over 2 years ago 0 92 other Python
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
simon987/awesome-datahoarding 892 0 0 over 2 years ago 0 4
List of data-hoarding related tools
internetarchive/brozzler 613 2 0 about 2 years ago 23 January 02, 2020 40 apache-2.0 Python
brozzler - distributed browser-based web crawler
ArchiveTeam/ArchiveBot 328 0 0 over 2 years ago 0 169 mit Python
ArchiveBot, an IRC bot for archiving websites
sparrow629/Tumblr_Crawler 258 0 0 over 7 years ago 0 2 gpl-3.0 Python
This is a Multi-thread crawler for Tumblr.
icy/google-group-crawler 213 0 0 about 4 years ago 0 6 Shell
[Deprecated] Get (almost) original messages from google group archives. Your data is yours.
commoncrawl/cc-crawl-statistics 97 0 0 over 2 years ago 0 0 apache-2.0 Python
Statistics of Common Crawl monthly archives mined from URL index files
ArchiveTeam/wget-lua 72 0 0 over 2 years ago 0 10 gpl-3.0 C
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Alternatives To nla/chronicrawl
Select To Compare


Alternative Project Comparisons
Popular Archive Projects
Popular Crawler Projects
Popular Data Storage Categories
Related Searches
Get A Weekly Email With Trending Projects
No Spam. Unsubscribe easily at any time.
Privacy | About | Terms | Follow Us On Twitter

Downloads, Dependent Repos, Dependent Packages, Total Releases, Latest Releases data powered by Libraries.io.

Copyright 2018-2026 Awesome Open Source.  All rights reserved.