Hiya
is it possible (in one step) to index documents that match the
results of a given query? That is, similar to the Delete by Query API,
can one index by query?
This is something you would need to handle yourself. Depending on which
client API you use, you may already have some utilities available to
make it easy.
For instance, in the Perl API, you could do:
$search = $es->scrolled_search(
query => { ... whatever query ... },
search_type => 'scan'
);
$es->reindex( source => $search, dest_index => 'new_index')
If your API doesn't have anything similar, then you can write it
yourself. All it consists of is:
- a scrolled search (with scanning)
- bulk indexing
To implement a scrolled search, run an ordinary search request, with
these parameters:
-
scroll: "1m"
this says that you want to take a "snapshot" of the current state
of your data, and to keep that around for, eg, "1m"
-
search_type: "scan"
this search type disables sorting and is very efficient for
retrieving large numbers of documents from elasticsearch
http://www.elasticsearch.org/guide/reference/api/search/search-type.html
-
size: 1000
the number of docs to return in each request. actually, with
'scan', this means the number of docs to return from EACH shard
in each request, eg 5 * 1000 = max of 5000 docs in each request
The above search request will return a scroll ID. You pass that scroll
ID to each subsequent "scroll" request /_search/scroll, until you get no
more hits.
http://www.elasticsearch.org/guide/reference/api/search/scroll.html
The parameters are:
- scroll: "1m"
refresh the lock on the scroll snapshot and keep it in place for
another one minutes
- scroll_id: "xxxx"
the scroll ID returned by the original search request, or by the
previous scroll request. You MUST update this scroll ID to have
the value of the previous request
Each time you call /_search/scroll you will get another batch of
documents.
You can reindex those (to a new index, or after making any changes)
using the "bulk" API.
http://elasticsearch.org/guide/reference/api/bulk.html
clint
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.