| NikolaiT/Crawling-Infrastructure |
321 |
|
0 |
0 |
about 4 years ago |
0 |
|
22 |
agpl-3.0 |
TypeScript |
| Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues. |
| stopstalk/stopstalk-deployment |
306 |
|
0 |
0 |
over 2 years ago |
0 |
|
92 |
mit |
Python |
| Stop stalking and start StopStalking :wink: |
| commoncrawl/cc-pyspark |
280 |
|
0 |
0 |
about 3 years ago |
0 |
|
4 |
mit |
Python |
| Process Common Crawl data with Python and Spark |
| intoli/intoli-article-materials |
255 |
|
0 |
0 |
over 3 years ago |
0 |
|
85 |
other |
JavaScript |
| All of the supporting materials for articles from Intoli's blog. |
| trek10inc/awsets |
184 |
|
0 |
1 |
about 3 years ago |
35 |
May 19, 2022 |
6 |
mit |
Go |
| A utility for crawling an AWS account and exporting all its resources for further analysis. |
| MarcelloLins/ServerlessCrawler-VancouverRealState |
66 |
|
0 |
0 |
over 8 years ago |
0 |
|
1 |
mit |
Python |
| A Serverless Crawler For Real State Data in Vancouver Using AWS Lambda, Dynamo, RDS MySQL and CloudWatch |
| LeiShi1313/serverless-web-differ |
60 |
|
0 |
0 |
over 3 years ago |
0 |
|
0 |
mit |
Python |
| A serverless web browser which crawls websites and compares pages by schedule. |
| mylamour/blog |
59 |
|
0 |
0 |
about 2 years ago |
0 |
|
99 |
|
SCSS |
| Your internal mediocrity is the moment when you lost the faith of being excellent. Just do it. |
| rossf7/elasticrawl |
50 |
|
1 |
0 |
about 9 years ago |
10 |
February 15, 2017 |
1 |
mit |
Ruby |
| Launch AWS Elastic MapReduce jobs that process Common Crawl data. |
| hfreire/browser-as-a-service |
43 |
|
0 |
0 |
over 3 years ago |
0 |
|
30 |
mit |
JavaScript |
| A web browser :earth_americas: hosted as a service, to render your JavaScript web pages as HTML |