I am struggling with creating the following application
Extract specific data
from 1000s of policies
Searchable PDFs - can get full text directy
Image PDFs - using Tesseract to OCR to get full text
feed full policy text to ES and store the following indices
Policy # - String
Premium - $
and store them so that the end of the day, I have a table in say Oracle
like this:
Policy # Premium
12345 $ 2314
23451 $ 4231
And so on . . ..
There is a lot of analytics that I can do with this table (there is more
fields I am execting to extract of course, ~ 7-10 total fields)
We can get the full text and we can feed to ES i one field.
We are kinda on our way to create the indices we want
I just done know if there is a way to get the stored index data (label,
value) out of ES into a structured DB table.
If you have experience attempting something like this, Id love to hear
about the feasibility/challenges of such an attempt.
It is hard to answer a question like this because you do not specify which
tools are available for you (clients, programming language, etc.) so the
answer depends.
If you can write programs it is not very hard to query ES, look up the
result docs, and execute appropriate SQL insert/update.
If you can not write programs and you want ES automatically do the SQL for
you, this is not possible. The closest you can get is probably using a
plugin like the CSV format plugin
where you can extract values from ES into tabular data from command line.
This CSV can be saved and used for RDBMS import.
Jörg
On Sat, Jul 12, 2014 at 5:09 PM, Rajesh Iyer iyer70@gmail.com wrote:
I just done know if there is a way to get the stored index data (label,
value) out of ES into a structured DB table.
If you have experience attempting something like this, Id love to hear
about the feasibility/challenges of such an attempt.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.