Sparkler Alternatives

Name: USCDataScience/sparkler
Brand: USCDataScience/sparkler
SKU: project/USCDataScience/sparkler
Rating: 4.57 (401 reviews)

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Categories > Data Processing > Search

Suggest Alternative

Stars

401

Alternatives

License

apache-2.0

Open Issues

Most Recent Commit

over 3 years ago

Programming Language

Java

Dependent Repos

Dependent Packages

Total Releases

Categories

Programming Languages > Java

Computer Science > Search

Web User Interface > Dashboard

Web Servers > Apache

Data Processing > Spark

Data Processing > Web Crawler

Data Processing > Big Data

Data Processing > Search Engine

Data Processing > Solr

Software Architecture > Distributed Systems

Data Processing > Lucene

Data Processing > Information Retrieval

Data Processing > Tika

Site

Repo

Alternatives To USCDataScience/sparkler

Project Name	Stars	Repos Using This	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
vector4wang/spring-boot-quick	2,282	0	0	over 2 years ago	0		13		Java
:herb: 基于springboot的快速学习示例,整合自己遇到的开源框架,如：rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、k3s、k3d、k8s、mybatis加解密插件、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等:pushpin:
USCDataScience/sparkler	401	0	0	over 3 years ago	0		55	apache-2.0	Java
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
commoncrawl/cc-pyspark	280	0	0	over 3 years ago	0		4	mit	Python
Process Common Crawl data with Python and Spark
zhangslob/docs	102	0	0	about 7 years ago	0		3
《数据采集从入门到放弃》源码。内容简介：爬虫介绍、就业情况、爬虫工程师面试题；HTTP协议介绍； Requests使用；解析器Xpath介绍； MongoDB与MySQL；多线程爬虫； Scrapy介绍；Scrapy-redis介绍；使用docker部署；使用nomad管理docker集群；使用EFK查询docker日志
commoncrawl/cc-index-table	78	0	0	almost 3 years ago	0		8	apache-2.0	Java
Index Common Crawl archives in tabular format
CI-Research/KeywordAnalysis	33	0	0	almost 8 years ago	0		0
Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends
YBIGTA/EngineeringTeam	32	0	0	over 7 years ago	0		2
와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.
youhusky/Search_Ads_Web_Service	27	0	0	over 8 years ago	0		0		Java
Online search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]
huntingzhu/Steam_Recommendation_System	25	0	0	over 8 years ago	0		0		Jupyter Notebook
Recommendation System, Collaborative Filtering, Spark, Hive, Flask, Web Crawler, AWS EC2, AWS RDS
r-spark/sparkwarc	13	0	0	over 4 years ago	4	January 11, 2022	0	apache-2.0	WebAssembly
Load WARC files into Apache Spark with sparklyr

Alternatives To USCDataScience/sparkler

Select To Compare

vector4wang/spring-boot-quick ⭐ 2,282

:herb: 基于springboot的快速学习示例,整合自己遇到的开源框架,如：rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、k3s、k3d、k8s、mybatis加解密插件、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等:pushpin:

dependent packages 0 total releases 0 most recent commit over 2 years ago

USCDataScience/sparkler ⭐ 401

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

dependent packages 0 total releases 0 most recent commit over 3 years ago

commoncrawl/cc-pyspark ⭐ 280

Process Common Crawl data with Python and Spark

dependent packages 0 total releases 0 most recent commit over 3 years ago

zhangslob/docs ⭐ 102

《数据采集从入门到放弃》源码。内容简介：爬虫介绍、就业情况、爬虫工程师面试题；HTTP协议介绍； Requests使用；解析器Xpath介绍； MongoDB与MySQL；多线程爬虫； Scrapy介绍；Scrapy-redis介绍；使用docker部署；使用nomad管理docker集群；使用EFK查询docker日志

dependent packages 0 total releases 0 most recent commit about 7 years ago

commoncrawl/cc-index-table ⭐ 78

Index Common Crawl archives in tabular format

dependent packages 0 total releases 0 most recent commit almost 3 years ago

CI-Research/KeywordAnalysis ⭐ 33

Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends

dependent packages 0 total releases 0 most recent commit almost 8 years ago

YBIGTA/EngineeringTeam ⭐ 32

와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.

dependent packages 0 total releases 0 most recent commit over 7 years ago

youhusky/Search_Ads_Web_Service ⭐ 27

Online search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]

dependent packages 0 total releases 0 most recent commit over 8 years ago

huntingzhu/Steam_Recommendation_System ⭐ 25

Recommendation System, Collaborative Filtering, Spark, Hive, Flask, Web Crawler, AWS EC2, AWS RDS

dependent packages 0 total releases 0 most recent commit over 8 years ago

r-spark/sparkwarc ⭐ 13

Load WARC files into Apache Spark with sparklyr

dependent packages 0 total releases 4 most recent commit over 4 years ago

Suggest An Alternative To sparkler

Alternative Project Comparisons

USCDataScience/sparkler vs Spring Boot Quick

USCDataScience/sparkler vs Sparkler

USCDataScience/sparkler vs Cc Pyspark

USCDataScience/sparkler vs Docs

USCDataScience/sparkler vs Cc Index Table

USCDataScience/sparkler vs Keywordanalysis

USCDataScience/sparkler vs Engineeringteam

USCDataScience/sparkler vs Search_ads_web_service

USCDataScience/sparkler vs Steam_recommendation_system

USCDataScience/sparkler vs Sparkwarc

Popular Spark Projects

apache/spark⭐ 37,661

Apache Spark - A unified analytics engine for large-scale data processing

donnemartin/data-science-ipython-notebooks⭐ 25,668

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

getredash/redash⭐ 24,479

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

yeasy/docker_practice⭐ 23,279

Learn and understand Docker&Container technologies, with real DevOps practice!

DataTalksClub/data-engineering-zoomcamp⭐ 19,461

Free Data Engineering course!

Popular Crawler Projects

scrapy/scrapy⭐ 49,918

Scrapy, a fast high-level web crawling & scraping framework for Python.

NaiboWang/EasySpider⭐ 43,770

A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化的设计和执行爬虫任务。别名：ServiceWrapper面向Web应用的智能化服务封装系统。

iawia002/lux⭐ 30,853

👾 Fast and simple video download library and CLI tool written in Go

gocolly/colly⭐ 21,443

Elegant Scraper and Crawler Framework for Golang

jhao104/proxy_pool⭐ 19,442

Python ProxyPool for web spider

Popular Data Processing Categories

Jupyter Notebook

Dataset

Sql

Validation

Pipeline

Translation

Data Science

Classification

Transaction

Scraper