Using Python/Java code for preprocessing documents

It seems like a pretty easy question, but for some reason I still can't
understand how to solve the same. I have an elastic search cluster which is
using twitter river to download tweets. I would like to implement a
sentiment analysis module which takes each tweet and computes a score
(+ve/-ve) etc. I would like the score to be computed for each of the
existing tweets as well as for new tweets and then visualize using Kibana.

However, I am not sure where should I place the call to this sentiment
analysis module in the elastic search pipeline.

I have considered the option of modifying twitter river plugin but that
will not work retrospectively.

Essentially, I need to answer two questions :- 1) how to call python/java
code while indexing a document so that I can modify the json accordingly.
2) how to use the same code to modify all the existing documents in ES.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7fa4221c-d6f4-4f3f-a14d-0c7585305605%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

  1. You can not call python/java code after a document passed the ES API.
    Especially the _source is not modified. So you have to do it for yourself
    before the document is passed to the ES API, or write a plugin
    that provides an extension of the indexing API, for allowing to apply
    scripts before the document is passed on to the shards/replicas.

  2. You can modify existing documents with the update API. With
    https://github.com/elasticsearch/elasticsearch-lang-python I think it
    should be possible to use jython for update scripts, but I have not tested
    it.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF3dX-pXua_10VxyFbKFBYP33X7g%2B5J6CQD24CdFczMJA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

But, the update api takes one document at a time. How can I run it in a
distributed and scalable way. The issue is that whenever I change my
sentiment analysis module, i will have to recompute the score for each
document.

On Saturday, December 7, 2013 8:15:22 PM UTC+1, Jörg Prante wrote:

  1. You can not call python/java code after a document passed the ES API.
    Especially the _source is not modified. So you have to do it for yourself
    before the document is passed to the ES API, or write a plugin
    that provides an extension of the indexing API, for allowing to apply
    scripts before the document is passed on to the shards/replicas.

  2. You can modify existing documents with the update API. With
    https://github.com/elasticsearch/elasticsearch-lang-python I think it
    should be possible to use jython for update scripts, but I have not tested
    it.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/930d4aaa-d586-451e-aee1-971dde64261a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

For recomputing scores, you should consider function score queries:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEFcrLb7_u3-HQN0LmtkiSkezJeCRwDC4dWcVUPHMEJTA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Amit,

I am trying to do something similar. Did you find a solution to run the
sentiment analysis at Insertion?

I posted my question here few mins back:
https://groups.google.com/forum/#!topic/elasticsearch/Nx3YlftE1sI

Amay

On Saturday, December 7, 2013 at 11:01:16 AM UTC-8, Amit Gupta wrote:

It seems like a pretty easy question, but for some reason I still can't
understand how to solve the same. I have an elastic search cluster which is
using twitter river to download tweets. I would like to implement a
sentiment analysis module which takes each tweet and computes a score
(+ve/-ve) etc. I would like the score to be computed for each of the
existing tweets as well as for new tweets and then visualize using Kibana.

However, I am not sure where should I place the call to this sentiment
analysis module in the elastic search pipeline.

I have considered the option of modifying twitter river plugin but that
will not work retrospectively.

Essentially, I need to answer two questions :- 1) how to call python/java
code while indexing a document so that I can modify the json accordingly.
2) how to use the same code to modify all the existing documents in ES.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b697fea7-b034-4226-9d00-24b0412e96d0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.