Index image files with OCR

hi,
i want to OCR image files and index them in ElasticSearch, i want to be able to highlight sections of the images with respect to search results...is it possible to do with the ingest plugin ? i know that apache tika is used but wasnt sure if OCR of images was supported.

thanks

I don't know if Tikka can do that, but you'd need to use something like it.

OCR works in Tika when Tesseract is available. But ingest-attachment does not work with Tesseract I think.

You can look at FSCrawler project which is supposed to work with Tesseract although I know there is an open issue about this.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.