Tika Dockers Alternatives

A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video
Suggest Alternative
Alternatives To USCDataScience/tika-dockers
Project Name Stars Downloads Repos Using This Packages Using This Most Recent Commit Total Releases Latest Release Open Issues License Language
apache/tika 2,007 1,687 570 about 2 years ago 66 October 17, 2023 49 apache-2.0 Java
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
chrismattmann/tika-python 1,316 83 54 over 2 years ago 35 January 02, 2023 4 apache-2.0 Python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
ICIJ/datashare 519 0 0 about 2 years ago 135 November 21, 2023 17 agpl-3.0 Java
A self-hosted search engine for documents.
USCDataScience/sparkler 401 0 0 about 3 years ago 0 55 apache-2.0 Java
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
google/go-tika 171 12 13 about 3 years ago 9 April 17, 2025 9 apache-2.0 Go
Go package for using Apache Tika
LogicalSpark/docker-tikaserver 160 0 0 over 3 years ago 0 9 apache-2.0 Dockerfile
Apache Tika Server as a Docker Image
shebinleo/pdf2html 117 2 6 over 2 years ago 26 January 22, 2023 7 apache-2.0 JavaScript
pdf2html is a module which helps to convert PDF file to HTML pages using Apache Tika. This module also helps to generate thumbnail image for PDF file using Apache PDFBox.
nasa-jpl-memex/memex-explorer 106 0 0 about 10 years ago 0 67 bsd-2-clause Python
Viewers for statistics and dashboarding of Domain Search Engine data
vaites/php-apache-tika 104 3 3 over 2 years ago 38 April 14, 2023 0 mit PHP
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
chrismattmann/imagecat 84 0 0 over 7 years ago 0 0 Java
ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (images,but could be extended to other files) in place, and to extract metadata and OCR information from those files/images using Tika and Tesseract OCR.
Alternatives To USCDataScience/tika-dockers
Select To Compare


Alternative Project Comparisons
Popular Tika Projects
Popular Apache Projects
Popular Data Processing Categories
Related Searches
Get A Weekly Email With Trending Projects
No Spam. Unsubscribe easily at any time.
Privacy | About | Terms | Follow Us On Twitter

Downloads, Dependent Repos, Dependent Packages, Total Releases, Latest Releases data powered by Libraries.io.

Copyright 2018-2026 Awesome Open Source.  All rights reserved.