Extracting specific fields from long documents and moving them to a structured DB for analysis

Rajesh_Iyer · July 12, 2014, 3:09pm

I am struggling with creating the following application

Extract specific data
from 1000s of policies

Searchable PDFs - can get full text directy
Image PDFs - using Tesseract to OCR to get full text

feed full policy text to ES and store the following indices

Policy # - String
Premium - $
and store them so that the end of the day, I have a table in say Oracle
like this:

Policy # Premium
12345 $ 2314
23451 $ 4231

And so on . . ..

There is a lot of analytics that I can do with this table (there is more
fields I am execting to extract of course, ~ 7-10 total fields)

We can get the full text and we can feed to ES i one field.
We are kinda on our way to create the indices we want

I just done know if there is a way to get the stored index data (label,
value) out of ES into a structured DB table.
If you have experience attempting something like this, Id love to hear
about the feasibility/challenges of such an attempt.

Regards,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/91dce4e2-e28b-444a-aef8-1a48c123c740%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

AsyncAwait · July 13, 2014, 7:00am

So what is your question?

ES team has commercial support options in case you have enough bandwidth to help in overall design etc.,

Disclaimer: I am a open source enthusiast.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/59bc907b-9922-4938-baeb-38ba6f0cab1e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

AsyncAwait · July 13, 2014, 7:00am

So what is your question?

ES team has commercial support options in case you have enough bandwidth to help in overall design etc.,

Disclaimer: I am a open source enthusiast.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/56b3b09a-dcc3-4d19-a430-3cae3f425c32%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · July 18, 2014, 7:39am

It is hard to answer a question like this because you do not specify which
tools are available for you (clients, programming language, etc.) so the
answer depends.

If you can write programs it is not very hard to query ES, look up the
result docs, and execute appropriate SQL insert/update.

If you can not write programs and you want ES automatically do the SQL for
you, this is not possible. The closest you can get is probably using a
plugin like the CSV format plugin

where you can extract values from ES into tabular data from command line.
This CSV can be saved and used for RDBMS import.

Jörg

On Sat, Jul 12, 2014 at 5:09 PM, Rajesh Iyer iyer70@gmail.com wrote:

I just done know if there is a way to get the stored index data (label,
value) out of ES into a structured DB table.
If you have experience attempting something like this, Id love to hear
about the feasibility/challenges of such an attempt.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFBT30e5JjZ9XJD-tEZu6AFEJ_Rv0fNq6sOvM_sgR8SSg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Advices on mapping field to huge text value Elasticsearch	1	353	July 6, 2017
Extracting fields in bulk - using ES as a data store Elasticsearch	4	550	July 6, 2017
Is it possible to delete data/field without affecting the index? Elasticsearch	3	387	July 6, 2017
Is it possible to get query results from document values? Elasticsearch	3	397	July 6, 2017
Elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job Elasticsearch	11	582	July 6, 2017

Extracting specific fields from long documents and moving them to a structured DB for analysis

Related topics