Read image text from pdf

I think you can easily add it in a github issue

ok . I added pdf file on github issue

@dadoonet I indexed pdf and following is my content field

content:
Trade Marks Journal No: 1792 , 10/04/2017 Class 3

2748941 03/06/2014
NIKITA RACHIT MODI
RACHIT VINODKUMAR MODI
NAYNABEN VINODKUMAR MODI

trading as ;CONNOTE HEALTHCARE
41, SWASTIK BUNGLOWS PART-1, OPP. HIGH COURT, R.C. TECHNICAL COLLEGE ROAD, GHATLODIYA, AHMEDABAD -
380 061. GUJARAT INDIA.
MANUFACTURER AND MERCHANT

Address for service in India/Agents address:
B. D. SHUKLA & COMPANY .
45-B, NARAYAN NAGAR SOCIETY, PALDI, AHMEDABAD 380 007 .
Used Since :07/04/2014

AHMEDABAD
COSMETICS, PERFUMERY, DEODORANTS, LOTIONS, CREAMS SOAPS AND SHAMPOO ALL INCLUDED IN CLASS-03

could I separate this content field in different fields??
Is this possible???
Please reply.

No it's not.

Ok.Thanks

@dadoonet
I can read image text in windows 64 bit also. I installed old version of tesseract ocr .
Now problem only with pdf indexing with images text.

Sorry, if it is looks like an ad, but we created an Ambar: integrated ES + TIKA + PDFBOX + Tesseract. It can parse any file and search throught it. Also it have a nice web ui. It's available on github https://github.com/RD17/ambar

Hello @RD17Ambar
I m working on windows.
Can I install ambar on windows???

You can spin up a VM with Ambar on Windows. All interaction with Ambar perfomed throught REST API, so it would not be a problem.

Ok. But is there any istallation steps I need to follow on windows?

Nope, if you have any troubles doing installation please post an issue to our github (https://github.com/RD17/ambar)

Hello @RD17Ambar
I tried to install amber on VM by using following command
wget -O ambar.py https://static.ambar.cloud/ambar.py && chmod +x ./ambar.py

it gives error
--2017-04-27 10:28:02-- https://static.ambar.cloud/ambar.py
Resolving static.ambar.cloud (static.ambar.cloud)... 89.207.89.82
Connecting to static.ambar.cloud (static.ambar.cloud)|89.207.89.82|:443... connected.
ERROR: no certificate subject alternative name matches
requested host name static.ambar.cloud'. To connect to static.ambar.cloud insecurely, use--no-check-certificate'.

Is this right way??

It's actually quite strange since neither us (see the screenshot below) nor other users ever experienced this sort of error.
The certificate is valid, I'm confident about it. Maybe you should try running
wget --no-check-certificate -O ambar.py https://static.ambar.cloud/ambar.py && chmod +x ./ambar.py

Thanks It works

In installation step
sudo ./ambar.py install

It gives error
notroot@ubuntu:~$ sudo ./ambar.py install
/usr/bin/env: python3: No such file or directory

Hmm, strange. What version of ubuntu do you have?

Ubuntu-12.04-amd64

Please, update to 16.04

Ok. Will try and let u know.
Thanks for quick reply @RD17Ambar

We wrote a post on 'parse and search' with Ambar https://blog.ambar.cloud/ambar-use-case-integrated-parse-and-search-solution/