Elasticsearch processing pipeline capability?

Is there any facility in elasticsearch to help with sending terms to an
external processes after lucene processing (tokenization, filters, etc)?
The idea here is having some external analysis / nlp code run against the
documents while keeping all the pre-processing choices consistent and in
one place (i.e. the analysis setup in elasticsearch index configuration).

I am not very familiar with Lucene, but I believe possibly their update
request processor is intended for scenarios like this needing a simple
pipeline.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

If you want to retrieve the term list of an index after Lucene processing
via REST HTTP API, you can try

Jörg

On Tue, Aug 26, 2014 at 10:41 PM, Kevin B blaisdellk@gmail.com wrote:

Is there any facility in elasticsearch to help with sending terms to an
external processes after lucene processing (tokenization, filters, etc)?
The idea here is having some external analysis / nlp code run against the
documents while keeping all the pre-processing choices consistent and in
one place (i.e. the analysis setup in elasticsearch index configuration).

I am not very familiar with Lucene, but I believe possibly their update
request processor is intended for scenarios like this needing a simple
pipeline.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGQ1NaTJn31H%3DTn7xLQTwagXWSDT5vM3xDLtt9wfcTaTw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jorg,

Thanks. I actually have used the term list plugin (thanks) for some quick
prototype / experiments.

I actually meant I am not familiar with SOLR. Lucene I do have some
familiarity with. In this case I was wanting to really be able to send the
analysed text on to some post processing either in parallel or prior to
indexing. I can have the other process load up the same sets of analyser
config being used by ES with lucene, but then I have to manage 2 sets of
analysis configuration (external process + es) plus I am making 2 passes on
the data. Or I can come back and hit the index after it is built with
maybe the term vector api, but again 2 passes on the data.

From the lack of response I am guessing there isn't a facility for this. I
am surprised because I figured a lot of people would be running various
things over their text data to better analyse it, but I might also be
approaching it wrong.

Thanks again!
Kevin

On Tuesday, August 26, 2014 4:56:10 PM UTC-5, Jörg Prante wrote:

If you want to retrieve the term list of an index after Lucene processing
via REST HTTP API, you can try

https://github.com/jprante/elasticsearch-index-termlist

Jörg

On Tue, Aug 26, 2014 at 10:41 PM, Kevin B <blais...@gmail.com
<javascript:>> wrote:

Is there any facility in elasticsearch to help with sending terms to an
external processes after lucene processing (tokenization, filters, etc)?
The idea here is having some external analysis / nlp code run against the
documents while keeping all the pre-processing choices consistent and in
one place (i.e. the analysis setup in elasticsearch index configuration).

I am not very familiar with Lucene, but I believe possibly their update
request processor is intended for scenarios like this needing a simple
pipeline.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6f60301e-3fe0-4c90-8645-24a18e165a46%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/27e2885c-920b-41f1-85c7-6d554fc438c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.