I think you can easily add it in a github issue
ok . I added pdf file on github issue
@dadoonet I indexed pdf and following is my content field
Trade Marks Journal No: 1792 , 10/04/2017 Class 3
NIKITA RACHIT MODI
RACHIT VINODKUMAR MODI
NAYNABEN VINODKUMAR MODI
trading as ;CONNOTE HEALTHCARE
41, SWASTIK BUNGLOWS PART-1, OPP. HIGH COURT, R.C. TECHNICAL COLLEGE ROAD, GHATLODIYA, AHMEDABAD -
380 061. GUJARAT INDIA.
MANUFACTURER AND MERCHANT
Address for service in India/Agents address:
B. D. SHUKLA & COMPANY .
45-B, NARAYAN NAGAR SOCIETY, PALDI, AHMEDABAD 380 007 .
Used Since :07/04/2014
COSMETICS, PERFUMERY, DEODORANTS, LOTIONS, CREAMS SOAPS AND SHAMPOO ALL INCLUDED IN CLASS-03
could I separate this content field in different fields??
Is this possible???
No it's not.
I can read image text in windows 64 bit also. I installed old version of tesseract ocr .
Now problem only with pdf indexing with images text.
Sorry, if it is looks like an ad, but we created an Ambar: integrated ES + TIKA + PDFBOX + Tesseract. It can parse any file and search throught it. Also it have a nice web ui. It's available on github https://github.com/RD17/ambar
I m working on windows.
Can I install ambar on windows???
You can spin up a VM with Ambar on Windows. All interaction with Ambar perfomed throught REST API, so it would not be a problem.
Ok. But is there any istallation steps I need to follow on windows?
Nope, if you have any troubles doing installation please post an issue to our github (https://github.com/RD17/ambar)
it gives error
--2017-04-27 10:28:02-- https://static.ambar.cloud/ambar.py
Resolving static.ambar.cloud (static.ambar.cloud)... 184.108.40.206
Connecting to static.ambar.cloud (static.ambar.cloud)|220.127.116.11|:443... connected.
ERROR: no certificate subject alternative name matches
requested host name
static.ambar.cloud'. To connect to static.ambar.cloud insecurely, use--no-check-certificate'.
Is this right way??
It's actually quite strange since neither us (see the screenshot below) nor other users ever experienced this sort of error.
The certificate is valid, I'm confident about it. Maybe you should try running
wget --no-check-certificate -O ambar.py https://static.ambar.cloud/ambar.py && chmod +x ./ambar.py
Thanks It works
In installation step
sudo ./ambar.py install
It gives error
notroot@ubuntu:~$ sudo ./ambar.py install
/usr/bin/env: python3: No such file or directory
Hmm, strange. What version of ubuntu do you have?
Please, update to 16.04
Ok. Will try and let u know.
Thanks for quick reply @RD17Ambar
We wrote a post on 'parse and search' with Ambar https://blog.ambar.cloud/ambar-use-case-integrated-parse-and-search-solution/