Iterating over all data


(Ben McCann) #1

Hi,

Is there a good way to iterate over all data for an ElasticSearch type? Is
there any support for doing a map reduce against a large ES deployment?

Thanks,
Ben

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Drew Raines) #2

Ben McCann wrote:

Is there a good way to iterate over all data for an
ElasticSearch type?

Check out the scan search.

http://www.elasticsearch.org/guide/reference/api/search/search-type/

Is there any support for doing a map reduce against a large ES
deployment?

Can you be more specific? Do you want to submit some kind of
batch job? You might look at the scripting support which can
support arbitrary operations on your data. It's used in various
places through the API.

http://www.elasticsearch.org/guide/reference/modules/scripting/

Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ben McCann-2) #3

Thanks for the pointers. Yes, I'm looking to do a batch job to reprocess
all of the records in our index. Any tips on when should we prefer scan
search over scripting and vice versa for this type of operation?

Thanks for the help!
-Ben

On Sat, Sep 14, 2013 at 8:54 PM, Drew Raines aaraines@gmail.com wrote:

Ben McCann wrote:

Is there a good way to iterate over all data for an ElasticSearch type?

Check out the scan search.

http://www.elasticsearch.org/**guide/reference/api/search/**search-type/http://www.elasticsearch.org/guide/reference/api/search/search-type/

Is there any support for doing a map reduce against a large ES deployment?

Can you be more specific? Do you want to submit some kind of batch job?
You might look at the scripting support which can support arbitrary
operations on your data. It's used in various places through the API.

http://www.elasticsearch.org/**guide/reference/modules/**scripting/http://www.elasticsearch.org/guide/reference/modules/scripting/

Drew

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**En7J0T7RZ9U/unsubscribehttps://groups.google.com/d/topic/elasticsearch/En7J0T7RZ9U/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
about.me/benmccann

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Drew Raines) #4

Ben McCann wrote:

Thanks for the pointers. Yes, I'm looking to do a batch job to
reprocess all of the records in our index. Any tips on when
should we prefer scan search over scripting and vice versa for
this type of operation?

The best you can do inside of ES is with the update API, possibly
using a script, which can save you some bandwidth transferring
your whole documents.

http://www.elasticsearch.org/guide/reference/api/update/

If it turns out you need client-side processing, use a scan search
to retrieve the docs and the bulk API to index the newly
transformed docs. It's surprisingly efficient!

http://www.elasticsearch.org/guide/reference/api/bulk/

Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5