Awesome Open Source

Search

Programming Languages

About

The Top 10 Crawler Open Source Projects

Open source projects categorized as Crawler

Categories > Data Processing > Crawler

Edit Category

scrapy/scrapy ⭐ 49,918

Scrapy, a fast high-level web crawling & scraping framework for Python.

dependent packages 0 total releases 0 most recent commit over 2 years ago

NaiboWang/EasySpider ⭐ 43,770

A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化的设计和执行爬虫任务。别名：ServiceWrapper面向Web应用的智能化服务封装系统。

dependent packages 0 total releases 0 most recent commit 3 months ago

iawia002/lux ⭐ 30,853

👾 Fast and simple video download library and CLI tool written in Go

dependent packages 0 total releases 0 most recent commit 6 months ago

gocolly/colly ⭐ 21,443

Elegant Scraper and Crawler Framework for Golang

dependent packages 0 total releases 0 most recent commit over 2 years ago

jhao104/proxy_pool ⭐ 19,442

Python ProxyPool for web spider

dependent packages 0 total releases 0 most recent commit over 2 years ago

binux/pyspider ⭐ 15,943

A Powerful Spider(Web Crawler) System in Python.

dependent packages 0 total releases 0 most recent commit almost 3 years ago

codelucas/newspaper ⭐ 13,147

News, full-text, and article metadata extraction in Python 3. Advanced docs:

dependent packages 0 total releases 0 most recent commit over 2 years ago

shengqiangzhang/examples-of-web-crawlers ⭐ 13,142

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )

dependent packages 0 total releases 0 most recent commit over 2 years ago

waditu/tushare ⭐ 12,165

TuShare is a utility for crawling historical data of China stocks

dependent packages 0 total releases 0 most recent commit about 3 years ago

apify/crawlee ⭐ 11,229

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

dependent packages 0 total releases 0 most recent commit over 2 years ago

Get A Weekly Email With Trending Crawler Projects

No Spam. Unsubscribe easily at any time.

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2026 Awesome Open Source. All rights reserved.