Not able to index content of images

Hello @dadoonet,

When i am trying to run FSCrawler job for Images, It is showing me below error:

07:05:40,428 DEBUG [f.p.e.c.f.t.TikaInstance] OCR is activated.
07:05:40,428 DEBUG [f.p.e.c.f.t.TikaInstance] But Tesseract is not installed so we won't run OCR.

Can you help me in this?

Regards,
Priyanka

FSCrawler is not able to find Tesseract binary in your path.

Hello @dadoonet,

Thanks for reply!!!
How to check that?
I have already mentioned path in OCR section:

 ocr:
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
    follow_symlinks: false
    path: "C:/Program Files/Tesseract-OCR/tesseract.exe"
    data_path: "C:/Program Files/Tesseract-OCR/tessdata" 

Regards,
Priyanka

Please don't use the citation icon but the code icon </> to format your code.

Most of time, on Windows, I found that having a space in the path can cause some troubles. Program Files here might be the problem.
Could you try with something like:

path: "/Progra~1/Tesseract-OCR/tesseract.exe"

If I recall correctly the syntax.
Otherwise, could you add C:/Program Files/Tesseract-OCR/ in your windows system path?

Hello @dadoonet,

I cannot change the folder name of program Files to Program~1. Renaming of folder is disable on our server.
I have added windows system path to environment variable window.
How to mention that in OCR path?

Regards,
Priyanka

Then you need to remove the path from the configuration file. It will be detected automatically.

Note that you need to stop FSCrawler, start a new command line window and start FSCrawler again.

Hello @dadoonet,

Thanks for your help!!!!
My problem is resolved now. Thank you so much again :smile:

Regards,
Priyanka

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.