How to parse pdf with text and images in elasticsearch using eclipse?
You can look at ingest-attachment plugin which can extract text and metadata from your PDF docs.
I also wants to index text on images.
I used Tesseract OCR but it is indexing only images(not pdf images).
I used tika jar but it is indexing only pdf text.
Is there any way that i can parse pdf images/text or both.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.