Parse Pdf with image/text in elasticsearch using java?


(Abhishek) #1

How to parse pdf with text and images in elasticsearch using eclipse?


(David Pilato) #2

You can look at ingest-attachment plugin which can extract text and metadata from your PDF docs.


(Abhishek) #3

I also wants to index text on images.

I used Tesseract OCR but it is indexing only images(not pdf images).
I used tika jar but it is indexing only pdf text.

Is there any way that i can parse pdf images/text or both.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.