Using Python/Java code for preprocessing documents

Amit_Gupta · December 7, 2013, 7:01pm

It seems like a pretty easy question, but for some reason I still can't
understand how to solve the same. I have an elastic search cluster which is
using twitter river to download tweets. I would like to implement a
sentiment analysis module which takes each tweet and computes a score
(+ve/-ve) etc. I would like the score to be computed for each of the
existing tweets as well as for new tweets and then visualize using Kibana.

However, I am not sure where should I place the call to this sentiment
analysis module in the elastic search pipeline.

I have considered the option of modifying twitter river plugin but that
will not work retrospectively.

Essentially, I need to answer two questions :- 1) how to call python/java
code while indexing a document so that I can modify the json accordingly.
2) how to use the same code to modify all the existing documents in ES.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7fa4221c-d6f4-4f3f-a14d-0c7585305605%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · December 7, 2013, 7:15pm

You can not call python/java code after a document passed the ES API.
Especially the _source is not modified. So you have to do it for yourself
before the document is passed to the ES API, or write a plugin
that provides an extension of the indexing API, for allowing to apply
scripts before the document is passed on to the shards/replicas.
You can modify existing documents with the update API. With
https://github.com/elasticsearch/elasticsearch-lang-python I think it
should be possible to use jython for update scripts, but I have not tested
it.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF3dX-pXua_10VxyFbKFBYP33X7g%2B5J6CQD24CdFczMJA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Amit_Gupta · December 7, 2013, 7:25pm

But, the update api takes one document at a time. How can I run it in a
distributed and scalable way. The issue is that whenever I change my
sentiment analysis module, i will have to recompute the score for each
document.

On Saturday, December 7, 2013 8:15:22 PM UTC+1, Jörg Prante wrote:

You can not call python/java code after a document passed the ES API.
Especially the _source is not modified. So you have to do it for yourself
before the document is passed to the ES API, or write a plugin
that provides an extension of the indexing API, for allowing to apply
scripts before the document is passed on to the shards/replicas.

You can modify existing documents with the update API. With
GitHub - elastic/elasticsearch-lang-python: Python language Plugin for elasticsearch I think it
should be possible to use jython for update scripts, but I have not tested
it.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/930d4aaa-d586-451e-aee1-971dde64261a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · December 7, 2013, 11:38pm

For recomputing scores, you should consider function score queries:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEFcrLb7_u3-HQN0LmtkiSkezJeCRwDC4dWcVUPHMEJTA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ap1 · February 6, 2015, 12:02am

Hi Amit,

I am trying to do something similar. Did you find a solution to run the
sentiment analysis at Insertion?

I posted my question here few mins back:
https://groups.google.com/forum/#!topic/elasticsearch/Nx3YlftE1sI

Amay

On Saturday, December 7, 2013 at 11:01:16 AM UTC-8, Amit Gupta wrote:

It seems like a pretty easy question, but for some reason I still can't
understand how to solve the same. I have an Elasticsearch cluster which is
using twitter river to download tweets. I would like to implement a
sentiment analysis module which takes each tweet and computes a score
(+ve/-ve) etc. I would like the score to be computed for each of the
existing tweets as well as for new tweets and then visualize using Kibana.

However, I am not sure where should I place the call to this sentiment
analysis module in the Elasticsearch pipeline.

I have considered the option of modifying twitter river plugin but that
will not work retrospectively.

Essentially, I need to answer two questions :- 1) how to call python/java
code while indexing a document so that I can modify the json accordingly.
2) how to use the same code to modify all the existing documents in ES.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b697fea7-b034-4226-9d00-24b0412e96d0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
How to modify field contents during indexing? Elasticsearch	4	616	July 6, 2017
How to create new field into an elasticsearch index using logstash and python Logstash	8	4416	March 27, 2018
Update Api in java Elasticsearch	4	373	July 6, 2017
Plug-in Java or Python or Scala function to re-calculate document score? Elasticsearch	3	785	July 6, 2017
Implementing a plugin to process the whole input document Elasticsearch	11	562	July 6, 2017

Using Python/Java code for preprocessing documents

Related topics