It seems like a pretty easy question, but for some reason I still can't
understand how to solve the same. I have an elastic search cluster which is
using twitter river to download tweets. I would like to implement a
sentiment analysis module which takes each tweet and computes a score
(+ve/-ve) etc. I would like the score to be computed for each of the
existing tweets as well as for new tweets and then visualize using Kibana.
However, I am not sure where should I place the call to this sentiment
analysis module in the elastic search pipeline.
I have considered the option of modifying twitter river plugin but that
will not work retrospectively.
Essentially, I need to answer two questions :- 1) how to call python/java
code while indexing a document so that I can modify the json accordingly.
2) how to use the same code to modify all the existing documents in ES.
You can not call python/java code after a document passed the ES API.
Especially the _source is not modified. So you have to do it for yourself
before the document is passed to the ES API, or write a plugin
that provides an extension of the indexing API, for allowing to apply
scripts before the document is passed on to the shards/replicas.
But, the update api takes one document at a time. How can I run it in a
distributed and scalable way. The issue is that whenever I change my
sentiment analysis module, i will have to recompute the score for each
document.
On Saturday, December 7, 2013 8:15:22 PM UTC+1, Jörg Prante wrote:
You can not call python/java code after a document passed the ES API.
Especially the _source is not modified. So you have to do it for yourself
before the document is passed to the ES API, or write a plugin
that provides an extension of the indexing API, for allowing to apply
scripts before the document is passed on to the shards/replicas.
On Saturday, December 7, 2013 at 11:01:16 AM UTC-8, Amit Gupta wrote:
It seems like a pretty easy question, but for some reason I still can't
understand how to solve the same. I have an elastic search cluster which is
using twitter river to download tweets. I would like to implement a
sentiment analysis module which takes each tweet and computes a score
(+ve/-ve) etc. I would like the score to be computed for each of the
existing tweets as well as for new tweets and then visualize using Kibana.
However, I am not sure where should I place the call to this sentiment
analysis module in the elastic search pipeline.
I have considered the option of modifying twitter river plugin but that
will not work retrospectively.
Essentially, I need to answer two questions :- 1) how to call python/java
code while indexing a document so that I can modify the json accordingly.
2) how to use the same code to modify all the existing documents in ES.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.