| scrapy/scrapy |
49,918 |
|
4,185 |
445 |
about 2 years ago |
96 |
September 18, 2023 |
692 |
bsd-3-clause |
Python |
| Scrapy, a fast high-level web crawling & scraping framework for Python. |
| huginn/huginn |
40,328 |
|
69 |
52 |
about 2 years ago |
8 |
September 22, 2017 |
698 |
mit |
Ruby |
| Create agents that monitor and act on your behalf. Your agents are standing by! |
| apify/crawlee |
11,229 |
|
0 |
42 |
about 2 years ago |
747 |
December 10, 2023 |
96 |
apache-2.0 |
TypeScript |
| Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation. |
| lorien/awesome-web-scraping |
6,060 |
|
0 |
0 |
over 2 years ago |
0 |
|
1 |
other |
Makefile |
| List of libraries, tools and APIs for web scraping and data processing. |
| BruceDone/awesome-crawler |
5,859 |
|
0 |
0 |
over 2 years ago |
0 |
|
27 |
mit |
|
| A collection of awesome web crawler,spider in different languages |
| alirezamika/autoscraper |
5,159 |
|
0 |
1 |
almost 3 years ago |
16 |
July 17, 2022 |
9 |
mit |
Python |
| A Smart, Automatic, Fast and Lightweight Web Scraper for Python |
| Evil0ctal/Douyin_TikTok_Download_API |
4,844 |
|
0 |
0 |
over 2 years ago |
21 |
September 23, 2023 |
60 |
mit |
Python |
| 🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。 |
| go-rod/rod |
4,505 |
|
0 |
140 |
about 2 years ago |
406 |
November 06, 2023 |
106 |
mit |
Go |
| A Devtools driver for web automation and scraping |
| rchipka/node-osmosis |
4,083 |
|
218 |
58 |
over 2 years ago |
27 |
March 01, 2019 |
117 |
|
JavaScript |
| Web scraper for NodeJS |
| niespodd/browser-fingerprinting |
3,353 |
|
0 |
0 |
about 3 years ago |
0 |
|
7 |
|
JavaScript |
| Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️♂️ when scraping the web? |