| AccordBox/awesome-scrapy |
450 |
|
0 |
0 |
over 3 years ago |
0 |
|
2 |
|
|
| A curated list of awesome packages, articles, and other cool resources from the Scrapy community. |
| stanzhai/Html2Article |
425 |
|
1 |
0 |
about 9 years ago |
5 |
July 11, 2013 |
6 |
other |
C# |
| Html网页正文提取 |
| Tjatse/node-readability |
302 |
|
10 |
4 |
over 7 years ago |
67 |
August 01, 2018 |
9 |
|
JavaScript |
| Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English. |
| qiyaTech/javaCrawling |
252 |
|
0 |
0 |
over 8 years ago |
0 |
|
10 |
|
Java |
| "奇伢爬虫"是基于sprint boot 、 WebMagic 实现 微信公众号文章、新闻、csdn、info等网站文章爬取,可以动态设置文章爬取规则、清洗规则,基本实现了爬取大部分网站的文章。 |
| f111fei/article_spider |
187 |
|
0 |
0 |
about 8 years ago |
0 |
|
4 |
|
TypeScript |
| 微信公众号爬虫 |
| Harhao/wechatPubSpider |
107 |
|
0 |
0 |
about 4 years ago |
0 |
|
2 |
|
Python |
| wechat spiders微信公众号爬虫 |
| Dustyposa/goSpider |
55 |
|
0 |
0 |
over 4 years ago |
0 |
|
1 |
mit |
Jupyter Notebook |
| some small project and some articles |
| pmyteh/RISJbot |
50 |
|
0 |
0 |
over 4 years ago |
0 |
|
4 |
|
Python |
| A scrapy project to extract the text and metadata of articles from news websites |
| Tjatse/spider2 |
42 |
|
0 |
1 |
over 10 years ago |
6 |
December 19, 2015 |
2 |
|
JavaScript |
| A 2nd generation spider to crawl any article site, automatic read title and article. |
| Lemon-XQ/Hexo-BaiduPushTool |
22 |
|
0 |
0 |
over 7 years ago |
0 |
|
2 |
mit |
Python |
| A Hexo tool for pushing articles to baidu |