Not able to index content of images

pyerunka · September 16, 2019, 5:22am

When i am trying to run FSCrawler job for Images, It is showing me below error:

07:05:40,428 DEBUG [f.p.e.c.f.t.TikaInstance] OCR is activated.
07:05:40,428 DEBUG [f.p.e.c.f.t.TikaInstance] But Tesseract is not installed so we won't run OCR.

Can you help me in this?

Regards,
Priyanka

dadoonet · September 16, 2019, 9:17am

FSCrawler is not able to find Tesseract binary in your path.

pyerunka · September 16, 2019, 9:20am

Hello @dadoonet,

Thanks for reply!!!
How to check that?
I have already mentioned path in OCR section:

 ocr:
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
    follow_symlinks: false
    path: "C:/Program Files/Tesseract-OCR/tesseract.exe"
    data_path: "C:/Program Files/Tesseract-OCR/tessdata"

Regards,
Priyanka

dadoonet · September 16, 2019, 9:46am

Please don't use the citation icon but the code icon </> to format your code.

Most of time, on Windows, I found that having a space in the path can cause some troubles. Program Files here might be the problem.
Could you try with something like:

path: "/Progra~1/Tesseract-OCR/tesseract.exe"

If I recall correctly the syntax.
Otherwise, could you add C:/Program Files/Tesseract-OCR/ in your windows system path?

pyerunka · September 16, 2019, 10:11am

Hello @dadoonet,

I cannot change the folder name of program Files to Program~1. Renaming of folder is disable on our server.
I have added windows system path to environment variable window.
How to mention that in OCR path?

Regards,
Priyanka

dadoonet · September 16, 2019, 10:31am

Then you need to remove the path from the configuration file. It will be detected automatically.

Note that you need to stop FSCrawler, start a new command line window and start FSCrawler again.

pyerunka · September 16, 2019, 10:53am

Hello @dadoonet,

Thanks for your help!!!!
My problem is resolved now. Thank you so much again

Regards,
Priyanka

system · October 14, 2019, 10:53am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problem when using Elasticsearch and Tesseract-OCR Elasticsearch	15	2067	August 19, 2020
Elastic search and fscrawler Elasticsearch	5	367	December 12, 2018
Read image text from pdf Elasticsearch	54	5234	June 7, 2017
Tif files in fscrawler Elasticsearch	25	1957	June 22, 2020
FSCrawler - OCR not working anymore in 2.9 without Tesseract location in PATH Elasticsearch	2	601	June 29, 2022

Not able to index content of images

Related topics