Read image text from pdf


(David Pilato) #21

I think you can easily add it in a github issue


(Nikhil Chandrakant Parab) #22

ok . I added pdf file on github issue


(Nikhil Chandrakant Parab) #23

@dadoonet I indexed pdf and following is my content field

content:
Trade Marks Journal No: 1792 , 10/04/2017 Class 3

2748941 03/06/2014
NIKITA RACHIT MODI
RACHIT VINODKUMAR MODI
NAYNABEN VINODKUMAR MODI

trading as ;CONNOTE HEALTHCARE
41, SWASTIK BUNGLOWS PART-1, OPP. HIGH COURT, R.C. TECHNICAL COLLEGE ROAD, GHATLODIYA, AHMEDABAD -
380 061. GUJARAT INDIA.
MANUFACTURER AND MERCHANT

Address for service in India/Agents address:
B. D. SHUKLA & COMPANY .
45-B, NARAYAN NAGAR SOCIETY, PALDI, AHMEDABAD 380 007 .
Used Since :07/04/2014

AHMEDABAD
COSMETICS, PERFUMERY, DEODORANTS, LOTIONS, CREAMS SOAPS AND SHAMPOO ALL INCLUDED IN CLASS-03

could I separate this content field in different fields??
Is this possible???
Please reply.


(David Pilato) #24

No it's not.


(Nikhil Chandrakant Parab) #25

Ok.Thanks


(Nikhil Chandrakant Parab) #26

@dadoonet
I can read image text in windows 64 bit also. I installed old version of tesseract ocr .
Now problem only with pdf indexing with images text.


(Ambar) #27

Sorry, if it is looks like an ad, but we created an Ambar: integrated ES + TIKA + PDFBOX + Tesseract. It can parse any file and search throught it. Also it have a nice web ui. It's available on github https://github.com/RD17/ambar


(Nikhil Chandrakant Parab) #28

Hello @RD17Ambar
I m working on windows.
Can I install ambar on windows???


(Ambar) #29

You can spin up a VM with Ambar on Windows. All interaction with Ambar perfomed throught REST API, so it would not be a problem.


(Nikhil Chandrakant Parab) #30

Ok. But is there any istallation steps I need to follow on windows?


(Ambar) #31

Nope, if you have any troubles doing installation please post an issue to our github (https://github.com/RD17/ambar)


(Nikhil Chandrakant Parab) #32

Hello @RD17Ambar
I tried to install amber on VM by using following command
wget -O ambar.py https://static.ambar.cloud/ambar.py && chmod +x ./ambar.py

it gives error
--2017-04-27 10:28:02-- https://static.ambar.cloud/ambar.py
Resolving static.ambar.cloud (static.ambar.cloud)... 89.207.89.82
Connecting to static.ambar.cloud (static.ambar.cloud)|89.207.89.82|:443... connected.
ERROR: no certificate subject alternative name matches
requested host name static.ambar.cloud'. To connect to static.ambar.cloud insecurely, use--no-check-certificate'.

Is this right way??


(Ambar) #33

It's actually quite strange since neither us (see the screenshot below) nor other users ever experienced this sort of error.
The certificate is valid, I'm confident about it. Maybe you should try running
wget --no-check-certificate -O ambar.py https://static.ambar.cloud/ambar.py && chmod +x ./ambar.py


(Nikhil Chandrakant Parab) #34

Thanks It works


(Nikhil Chandrakant Parab) #35

In installation step
sudo ./ambar.py install

It gives error
notroot@ubuntu:~$ sudo ./ambar.py install
/usr/bin/env: python3: No such file or directory


(Ambar) #36

Hmm, strange. What version of ubuntu do you have?


(Nikhil Chandrakant Parab) #37

Ubuntu-12.04-amd64


(Ambar) #38

Please, update to 16.04


(Nikhil Chandrakant Parab) #39

Ok. Will try and let u know.
Thanks for quick reply @RD17Ambar


(Ambar) #40

We wrote a post on 'parse and search' with Ambar https://blog.ambar.cloud/ambar-use-case-integrated-parse-and-search-solution/