Elasticsearch processing pipeline capability?


(Kevin Blaisdell) #1

Is there any facility in elasticsearch to help with sending terms to an
external processes after lucene processing (tokenization, filters, etc)?
The idea here is having some external analysis / nlp code run against the
documents while keeping all the pre-processing choices consistent and in
one place (i.e. the analysis setup in elasticsearch index configuration).

I am not very familiar with Lucene, but I believe possibly their update
request processor is intended for scenarios like this needing a simple
pipeline.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

If you want to retrieve the term list of an index after Lucene processing
via REST HTTP API, you can try

Jörg

On Tue, Aug 26, 2014 at 10:41 PM, Kevin B blaisdellk@gmail.com wrote:

Is there any facility in elasticsearch to help with sending terms to an
external processes after lucene processing (tokenization, filters, etc)?
The idea here is having some external analysis / nlp code run against the
documents while keeping all the pre-processing choices consistent and in
one place (i.e. the analysis setup in elasticsearch index configuration).

I am not very familiar with Lucene, but I believe possibly their update
request processor is intended for scenarios like this needing a simple
pipeline.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGQ1NaTJn31H%3DTn7xLQTwagXWSDT5vM3xDLtt9wfcTaTw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Kevin Blaisdell) #3

Jorg,

Thanks. I actually have used the term list plugin (thanks) for some quick
prototype / experiments.

I actually meant I am not familiar with SOLR. Lucene I do have some
familiarity with. In this case I was wanting to really be able to send the
analysed text on to some post processing either in parallel or prior to
indexing. I can have the other process load up the same sets of analyser
config being used by ES with lucene, but then I have to manage 2 sets of
analysis configuration (external process + es) plus I am making 2 passes on
the data. Or I can come back and hit the index after it is built with
maybe the term vector api, but again 2 passes on the data.

From the lack of response I am guessing there isn't a facility for this. I
am surprised because I figured a lot of people would be running various
things over their text data to better analyse it, but I might also be
approaching it wrong.

Thanks again!
Kevin

On Tuesday, August 26, 2014 4:56:10 PM UTC-5, Jörg Prante wrote:

If you want to retrieve the term list of an index after Lucene processing
via REST HTTP API, you can try

https://github.com/jprante/elasticsearch-index-termlist

Jörg

On Tue, Aug 26, 2014 at 10:41 PM, Kevin B <blais...@gmail.com
<javascript:>> wrote:

Is there any facility in elasticsearch to help with sending terms to an
external processes after lucene processing (tokenization, filters, etc)?
The idea here is having some external analysis / nlp code run against the
documents while keeping all the pre-processing choices consistent and in
one place (i.e. the analysis setup in elasticsearch index configuration).

I am not very familiar with Lucene, but I believe possibly their update
request processor is intended for scenarios like this needing a simple
pipeline.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/27e2885c-920b-41f1-85c7-6d554fc438c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4