| pymupdf/PyMuPDF |
3,526 |
|
34 |
341 |
about 2 years ago |
124 |
November 30, 2023 |
13 |
agpl-3.0 |
Python |
| PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents. |
| zelon88/HRConvert2 |
746 |
|
0 |
0 |
over 2 years ago |
0 |
|
8 |
gpl-3.0 |
PHP |
| A self-hosted, drag-and-drop & nosql file conversion server & share tool that supports 86 file formats in 13 languages. |
| abbyy/ocrsdk.com |
467 |
|
0 |
0 |
almost 5 years ago |
0 |
|
30 |
apache-2.0 |
Java |
| ABBYY Cloud OCR SDK |
| UB-Mannheim/ocr-fileformat |
168 |
|
0 |
0 |
over 2 years ago |
0 |
|
30 |
mit |
JavaScript |
| Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader) |
| pzaich/doc_ripper |
74 |
|
4 |
1 |
about 7 years ago |
8 |
February 05, 2019 |
3 |
mit |
Ruby |
| Parse text contents from common file formats |
| kba/hocr-spec |
52 |
|
0 |
0 |
over 4 years ago |
0 |
|
52 |
|
HTML |
| The hOCR Embedded OCR Workflow and Output Format |
| A-bone1/FSNS-tfrecord-generate |
48 |
|
0 |
0 |
about 8 years ago |
0 |
|
5 |
|
Python |
| FSNS tfrecord generate |
| dmi3kno/hocr |
26 |
|
0 |
0 |
almost 6 years ago |
0 |
|
3 |
other |
R |
| Text-to-tibble |
| matecat/MateCat-Win-Converter |
12 |
|
0 |
0 |
almost 6 years ago |
0 |
|
0 |
lgpl-3.0 |
C# |
| Helps MateCat Filters supporting more formats doing some auxiliary file conversions. |
| duncantl/Rtesseract |
11 |
|
0 |
0 |
about 4 years ago |
0 |
|
8 |
|
R |
| Interface to tesseract OCR system. |