OCR Plugin


(David Pilato) #1

Hi there,

Does anyone knows a nice OCR java library that I may use to add the OCR
feature [1] to the attachment plugin ?

Thanks

[1]


https://github.com/elasticsearch/elasticsearch-mapper-attachments/issues/10

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(Alexander Reelsen) #2

Hi David

On 4 Apr., 10:20, "da...@pilato.fr" da...@pilato.fr wrote:

Does anyone knows a nice OCR java library that I may use to add the OCR
feature [1] to the attachment plugin ?
I searched for the same some time ago, but didnt come up with anything
useful (free, there seem to be some commercial ones which are ok/
good).
In Open Source world, there is Tesseract possibly (however this is a
binary, and you would have to spawn a process)

Keep me posted about progress in case you find something cool, I need
this for my home paperwork (I never find stuff when I search for in
real life :slight_smile:

--Alexander


(David Pilato) #3

Thanks Alexander.

At this time : 0 answer from twitter, 1 from facebook :
http://asprise.com/ http://asprise.com/home/

I will dig into it to see if it could fit to my needs.

Keep in touch
David.

Le 4 avril 2012 à 10:51, Alexander Reelsen
alexander.reelsen@googlemail.com a écrit :

Hi David

On 4 Apr., 10:20, "da...@pilato.fr" da...@pilato.fr wrote:

Does anyone knows a nice OCR java library that I may use to add the OCR
feature [1] to the attachment plugin ?
I searched for the same some time ago, but didnt come up with anything
useful (free, there seem to be some commercial ones which are ok/
good).
In Open Source world, there is Tesseract possibly (however this is a
binary, and you would have to spawn a process)

Keep me posted about progress in case you find something cool, I need
this for my home paperwork (I never find stuff when I search for in
real life :slight_smile:

--Alexander

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(Thomas Peuss) #4

Hi David!

Am Mittwoch, 4. April 2012 10:20:06 UTC+2 schrieb David Pilato:

Does anyone knows a nice OCR java library that I may use to add the OCR
feature [1] to the attachment plugin ?

Here is a list from Ubuntu:
https://help.ubuntu.com/community/OCR

When it comes to Java:
http://www.roncemer.com/software-development/java-ocr

The Java project is not very active though...

CU
Thomas


(Shairon Toledo) #5

Here is a good post about OCR analysis
http://stackoverflow.com/questions/971344/java-based-ocr-sdk-api

However, I dont think that OCR in ES side is a good approach, OCRs consume
a lot of resource maybe you will create a bottleneck in ES node. Some of
those OCRs get in deep in embed images, probably will be more expensive for
the machine.

On Wed, Apr 4, 2012 at 7:54 AM, Thomas Peuss thomas.peuss@nterra.comwrote:

Hi David!

Am Mittwoch, 4. April 2012 10:20:06 UTC+2 schrieb David Pilato:

Does anyone knows a nice OCR java library that I may use to add the OCR
feature [1] to the attachment plugin ?

Here is a list from Ubuntu:
https://help.ubuntu.com/community/OCR

When it comes to Java:
http://www.roncemer.com/software-development/java-ocr

The Java project is not very active though...

CU
Thomas

--

Shairon Toledo
http://hashcode.me


(David Pilato) #6

Thanks to all. I think that the best answer is :
http://www.roncemer.com/software-development/java-ocr

FYI, there is one OCR in the cloud : http://ocrsdk.com/plans-and-pricing/
with a Java client : https://github.com/abbyysdk/ocrsdk.com/tree/master/Java

I agree for the bottleneck. Although it could fit to some use cases.

David

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de Shairon Toledo
Envoyé : mercredi 4 avril 2012 13:34
À : elasticsearch@googlegroups.com
Cc : david@pilato.fr
Objet : Re: OCR Plugin

Here is a good post about OCR analysis
http://stackoverflow.com/questions/971344/java-based-ocr-sdk-api

However, I dont think that OCR in ES side is a good approach, OCRs consume a
lot of resource maybe you will create a bottleneck in ES node. Some of those
OCRs get in deep in embed images, probably will be more expensive for the
machine.

On Wed, Apr 4, 2012 at 7:54 AM, Thomas Peuss thomas.peuss@nterra.com
wrote:

Hi David!

Am Mittwoch, 4. April 2012 10:20:06 UTC+2 schrieb David Pilato:

Does anyone knows a nice OCR java library that I may use to add the OCR
feature [1] to the attachment plugin ?

Here is a list from Ubuntu:
https://help.ubuntu.com/community/OCR

When it comes to Java:
http://www.roncemer.com/software-development/java-ocr

The Java project is not very active though...

CU
Thomas

--

Shairon Toledo
http://hashcode.me


(system) #7