| USCDataScience/sparkler |
401 |
|
0 |
0 |
about 3 years ago |
0 |
|
55 |
apache-2.0 |
Java |
| Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark. |
| mirkosertic/FXDesktopSearch |
168 |
|
0 |
0 |
about 2 years ago |
0 |
|
19 |
apache-2.0 |
Java |
| A JavaFX based desktop search application. |
| xautlx/nutch-htmlunit |
122 |
|
0 |
0 |
almost 11 years ago |
0 |
|
1 |
apache-2.0 |
Java |
| 基于Apache Nutch和Htmlunit的扩展实现AJAX页面爬虫抓取解析插件 |
| bejean/crawl-anywhere |
98 |
|
0 |
0 |
almost 9 years ago |
0 |
|
38 |
other |
PHP |
| Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration. |
| jculvey/roboto |
63 |
|
15 |
2 |
over 8 years ago |
42 |
August 24, 2014 |
12 |
|
JavaScript |
| A web crawler/scraper/spider for nodejs |
| ipeirotis/Mturk-Tracker |
35 |
|
0 |
0 |
about 8 years ago |
0 |
|
1 |
other |
Python |
| Depracated - Software for gathering historical data from Amazon Mechanical Turk Service |
| laveeshr/darkWebBot |
22 |
|
0 |
0 |
over 8 years ago |
0 |
|
0 |
other |
Python |
| Dark Web Crawler for crawling the hidden onion sites and indexing them in Solr |
| emilis/PolicyFeed |
18 |
|
0 |
0 |
over 13 years ago |
0 |
|
7 |
agpl-3.0 |
JavaScript |
| Government news aggregator |
| voltek62/RsparkleR |
10 |
|
0 |
0 |
over 8 years ago |
0 |
|
0 |
|
R |
| RsparkleR provides an R interface for launching virtual machines and deploying Sparkler |
| b-cube/nutch-crawler |
9 |
|
0 |
0 |
almost 11 years ago |
0 |
|
2 |
apache-2.0 |
Java |
| Apache Nutch fork tunned for web services and data discovery. |