Elastic search and fscrawler

Rahul_Shendge · November 14, 2018, 1:10pm

I am working on elastic search and fscrawler. I am trying to index pdf, docx, jpeg file on elsatic search using fscrawlers. On my local jpeg file text content read successfully but on my server it not read text content. My local machine is ubuntu and server is centos. Please help me.

dadoonet · November 14, 2018, 1:34pm

So you are indexing an image which contains text, that's right?
Did you install Tesseract for OCR?

Rahul_Shendge · November 14, 2018, 2:15pm

Yes I indexing an image which contains text and I installed Tesseract.

dadoonet · November 14, 2018, 2:28pm

Could you make sure that tesseract is available on the default PATH of your machine?
Otherwise configure https://fscrawler.readthedocs.io/en/latest/user/tips.html#ocr-path

Rahul_Shendge · November 14, 2018, 4:16pm

Resolved issue. Below libs are missing:

libjpeg-dev  libpng-dev  libtiff4-dev

Thanks.

system · December 12, 2018, 4:16pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Read image text from pdf Elasticsearch	54	5234	June 7, 2017
FScrawler not parsing jpg in PDF Elasticsearch	8	1322	April 1, 2020
Can't see the text content in images that are inside pdf or word file Elasticsearch	2	325	June 5, 2019
Fscrawler image file text extraction Elasticsearch	7	739	August 22, 2021
Problem when using Elasticsearch and Tesseract-OCR Elasticsearch	15	2067	August 19, 2020

Elastic search and fscrawler

Related topics