Is there a good way to iterate over all data for an
Elasticsearch type?
Check out the scan search.
Is there any support for doing a map reduce against a large ES
deployment?
Can you be more specific? Do you want to submit some kind of
batch job? You might look at the scripting support which can
support arbitrary operations on your data. It's used in various
places through the API.
Thanks for the pointers. Yes, I'm looking to do a batch job to reprocess
all of the records in our index. Any tips on when should we prefer scan
search over scripting and vice versa for this type of operation?
Is there any support for doing a map reduce against a large ES deployment?
Can you be more specific? Do you want to submit some kind of batch job?
You might look at the scripting support which can support arbitrary
operations on your data. It's used in various places through the API.
Thanks for the pointers. Yes, I'm looking to do a batch job to
reprocess all of the records in our index. Any tips on when
should we prefer scan search over scripting and vice versa for
this type of operation?
The best you can do inside of ES is with the update API, possibly
using a script, which can save you some bandwidth transferring
your whole documents.
If it turns out you need client-side processing, use a scan search
to retrieve the docs and the bulk API to index the newly
transformed docs. It's surprisingly efficient!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.