Hi there,
I'm trying to index some various documents using the Ingest-Attachment plugin. Some of these documents are image-based pdf documents that require OCR processing.
According to this page on StackOverflow, Ingest-Attchment (or rather the contained Tika implementation) can be configured to execute Tesseract by pointing to the correct directory where Tesseract is installed. In my case, I would have to include tesseractPath=C:\Program Files (x86)\Tesseract-OCR
to the Tika properties file.
I'm having difficulty, however, finding the properties file to update with the Tesseract path. There is a properties file in the elasticsearch/plugins/ingest-attachment folder, but simply adding to that file does not seem to do the trick. Starting Elasticsearch with this line included fails, showing the following error message:
java.lang.IllegalArgumentException: Unknown properties in plugin descriptor: [tesseractPath]
The tesseract installation works just fine standalone (has been tested). Does anyone know what I'm doing wrong and how to fix it?
Cheers!