Parse Pdf with image/text in elasticsearch using java?

How to parse pdf with text and images in elasticsearch using eclipse?

You can look at ingest-attachment plugin which can extract text and metadata from your PDF docs.

I also wants to index text on images.

I used Tesseract OCR but it is indexing only images(not pdf images).
I used tika jar but it is indexing only pdf text.

Is there any way that i can parse pdf images/text or both.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.