Tika Dockers Alternatives

Name: USCDataScience/tika-dockers
Brand: USCDataScience/tika-dockers
SKU: project/USCDataScience/tika-dockers
Rating: 4.43 (18 reviews)

A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video

Categories > Data Processing > Docker

Suggest Alternative

Stars

Alternatives

License

apache-2.0

Open Issues

Most Recent Commit

over 3 years ago

Dependent Repos

Dependent Packages

Total Releases

Categories

Virtualization > Docker

Machine Learning > Deep Learning

Machine Learning > Tensorflow

Virtualization > Dockerfile

Machine Learning > Computer Vision

Web Servers > Apache

Machine Learning > Image Captioning

Data Processing > Tika

Repo

Alternatives To USCDataScience/tika-dockers

Project Name	Stars	Repos Using This	Packages Using This	Most Recent Commit	Total Releases	Latest Release	Open Issues	License	Language
apache/tika	2,007	1,687	570	over 2 years ago	66	October 17, 2023	49	apache-2.0	Java
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
chrismattmann/tika-python	1,316	83	54	almost 3 years ago	35	January 02, 2023	4	apache-2.0	Python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
ICIJ/datashare	519	0	0	over 2 years ago	135	November 21, 2023	17	agpl-3.0	Java
A self-hosted search engine for documents.
USCDataScience/sparkler	401	0	0	over 3 years ago	0		55	apache-2.0	Java
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
google/go-tika	171	12	13	over 3 years ago	9	April 17, 2025	9	apache-2.0	Go
Go package for using Apache Tika
LogicalSpark/docker-tikaserver	160	0	0	almost 4 years ago	0		9	apache-2.0	Dockerfile
Apache Tika Server as a Docker Image
shebinleo/pdf2html	117	7	12	over 2 years ago	33	July 13, 2025	7	apache-2.0	JavaScript
pdf2html is a module which helps to convert PDF file to HTML pages using Apache Tika. This module also helps to generate thumbnail image for PDF file using Apache PDFBox.
nasa-jpl-memex/memex-explorer	106	0	0	over 10 years ago	0		67	bsd-2-clause	Python
Viewers for statistics and dashboarding of Domain Search Engine data
vaites/php-apache-tika	104	2	3	almost 3 years ago	43	October 04, 2025	0	mit	PHP
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
chrismattmann/imagecat	84	0	0	almost 8 years ago	0		0		Java
ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (images,but could be extended to other files) in place, and to extract metadata and OCR information from those files/images using Tika and Tesseract OCR.

Alternatives To USCDataScience/tika-dockers

Select To Compare

apache/tika ⭐ 2,007

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

dependent packages 570 total releases 66 most recent commit over 2 years ago

chrismattmann/tika-python ⭐ 1,316

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

dependent packages 54 total releases 35 most recent commit almost 3 years ago downloads badge

ICIJ/datashare ⭐ 519

A self-hosted search engine for documents.

dependent packages 0 total releases 135 most recent commit over 2 years ago

USCDataScience/sparkler ⭐ 401

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

dependent packages 0 total releases 0 most recent commit over 3 years ago

google/go-tika ⭐ 171

Go package for using Apache Tika

dependent packages 13 total releases 9 most recent commit over 3 years ago

LogicalSpark/docker-tikaserver ⭐ 160

Apache Tika Server as a Docker Image

dependent packages 0 total releases 0 most recent commit almost 4 years ago

shebinleo/pdf2html ⭐ 117

pdf2html is a module which helps to convert PDF file to HTML pages using Apache Tika. This module also helps to generate thumbnail image for PDF file using Apache PDFBox.

dependent packages 12 total releases 33 most recent commit over 2 years ago downloads badge

nasa-jpl-memex/memex-explorer ⭐ 106

Viewers for statistics and dashboarding of Domain Search Engine data

dependent packages 0 total releases 0 most recent commit over 10 years ago

vaites/php-apache-tika ⭐ 104

Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats

dependent packages 3 total releases 43 most recent commit almost 3 years ago

chrismattmann/imagecat ⭐ 84

ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (images,but could be extended to other files) in place, and to extract metadata and OCR information from those files/images using Tika and Tesseract OCR.

dependent packages 0 total releases 0 most recent commit almost 8 years ago

Suggest An Alternative To tika-dockers

Alternative Project Comparisons

USCDataScience/tika-dockers vs Tika

USCDataScience/tika-dockers vs Tika Python

USCDataScience/tika-dockers vs Datashare

USCDataScience/tika-dockers vs Sparkler

USCDataScience/tika-dockers vs Go Tika

USCDataScience/tika-dockers vs Docker Tikaserver

USCDataScience/tika-dockers vs Pdf2html

USCDataScience/tika-dockers vs Memex Explorer

USCDataScience/tika-dockers vs Php Apache Tika

USCDataScience/tika-dockers vs Imagecat

Popular Tika Projects

laurilehmijoki/s3_website⭐ 2,259

Manage an S3 website: sync, deliver via CloudFront, benefit from advanced S3 website features.

dadoonet/fscrawler⭐ 1,279

Elasticsearch File System Crawler (FS Crawler)

pemistahl/lingua⭐ 622

The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

pcbje/gransk⭐ 237

Document processing for investigations

ICIJ/extract⭐ 229

A cross-platform command line tool for parallelised content extraction and analysis.

Popular Apache Projects

apache/echarts⭐ 57,743

Apache ECharts is a powerful, interactive charting and data visualization library for browser

apache/superset⭐ 56,358

Apache Superset is a Data Visualization and Data Exploration Platform

fffaraz/awesome-cpp⭐ 53,034

A curated list of awesome C++ (or C) frameworks, libraries, resources, and shiny things. Inspired by awesome-... stuff.

wasabeef/awesome-android-ui⭐ 47,955

A curated list of awesome Android UI/UX libraries

apache/dubbo⭐ 39,757

The java implementation of Apache Dubbo. An RPC and microservice framework.

Popular Data Processing Categories

Jupyter Notebook

Dataset

Sql

Validation

Pipeline

Translation

Data Science

Classification

Transaction

Scraper