Parse Pdf with image/text in elasticsearch using java?

wolfghost · May 5, 2017, 9:57am

How to parse pdf with text and images in elasticsearch using eclipse?

dadoonet · May 5, 2017, 12:15pm

You can look at ingest-attachment plugin which can extract text and metadata from your PDF docs.

wolfghost · May 9, 2017, 9:23am

I also wants to index text on images.

I used Tesseract OCR but it is indexing only images(not pdf images).
I used tika jar but it is indexing only pdf text.

Is there any way that i can parse pdf images/text or both.

system · June 6, 2017, 9:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Indexing pdf, word, text, image files Elasticsearch	2	678	April 27, 2017
Index image files with OCR Elasticsearch	3	2638	April 29, 2017
How to Enable OCR in Elasticsearch for Enhanced PDF Readability? Elasticsearch ingest-pipeline	2	710	February 14, 2024
Read image text from pdf Elasticsearch	54	5234	June 7, 2017
How to index text files (pdf, doc, txt...) in Java? Elasticsearch	6	2631	January 18, 2023