| stil/curl-easy |
282 |
|
18 |
8 |
over 6 years ago |
9 |
May 20, 2017 |
5 |
mit |
PHP |
| cURL wrapper for PHP. Supports parallel and non-blocking requests. For high speed crawling, see stil/curl-robot |
| bitextor/bitextor |
260 |
|
0 |
0 |
over 2 years ago |
0 |
|
4 |
gpl-3.0 |
Python |
| Bitextor generates translation memories from multilingual websites |
| martymac/fpart |
208 |
|
0 |
0 |
about 2 years ago |
1 |
March 03, 2021 |
0 |
bsd-2-clause |
C |
| Sort files and pack them into partitions |
| atlonxp/recursive-goIndex-downloader |
107 |
|
0 |
0 |
over 3 years ago |
0 |
|
8 |
mit |
Jupyter Notebook |
| Simple GoIndex Downloader |
| fengzhizi715/PicCrawler |
53 |
|
0 |
0 |
over 6 years ago |
0 |
|
0 |
apache-2.0 |
Java |
| 使用RxJava2 和 Java 8的特性开发的图片爬虫 |
| cballou/caterpillar |
39 |
|
0 |
0 |
almost 10 years ago |
0 |
|
0 |
other |
PHP |
| Caterpillar is a PHP library intended for website crawling and screen scraping. It handles parallel requests using the curl_multi functions. |
| short-d/crawler |
31 |
|
0 |
0 |
over 5 years ago |
0 |
|
1 |
|
Go |
| Explore the web in parallel on thousands of machines |
| shihjyun/PTTmineR |
28 |
|
0 |
0 |
over 5 years ago |
0 |
|
0 |
other |
R |
| Parallel Searching and Crawling Data from PTT 🚀 |
| aws-samples/pywren-workshops |
23 |
|
0 |
0 |
about 6 years ago |
0 |
|
2 |
apache-2.0 |
Jupyter Notebook |
| Various workshop labs that make use of pywren to massively process data in parallel with AWS Lambda |
| wuseman/wspider |
15 |
|
0 |
0 |
over 3 years ago |
0 |
|
0 |
gpl-3.0 |
Shell |
| Probably one of the fastest crawler/spider tool around, mirror any website faster than you thought was possible |