Index by query?

Hi,

is it possible (in one step) to index documents that match the results of
a given query? That is, similar to the Delete by Query API, can one index
by query?

Thanks,
Sylvain

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Do you mean update by query? If so, there is an open issue to support it:

If you really mean index by query, you would need to provide an example,
because I don't see how something would be returned by a query unless it
already existed.

Cheers,

Ivan

On Thu, Feb 21, 2013 at 2:37 PM, sbellem sbellem@gmail.com wrote:

Hi,

is it possible (in one step) to index documents that match the results of
a given query? That is, similar to the Delete by Query API, can one index
by query?

Thanks,
Sylvain

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hiya

is it possible (in one step) to index documents that match the
results of a given query? That is, similar to the Delete by Query API,
can one index by query?

This is something you would need to handle yourself. Depending on which
client API you use, you may already have some utilities available to
make it easy.

For instance, in the Perl API, you could do:

$search = $es->scrolled_search(
    query => { ... whatever query ... },
    search_type => 'scan'
);

$es->reindex( source => $search, dest_index => 'new_index')

If your API doesn't have anything similar, then you can write it
yourself. All it consists of is:

  1. a scrolled search (with scanning)
  2. bulk indexing

To implement a scrolled search, run an ordinary search request, with
these parameters:

  • scroll: "1m"
    this says that you want to take a "snapshot" of the current state
    of your data, and to keep that around for, eg, "1m"

  • search_type: "scan"
    this search type disables sorting and is very efficient for
    retrieving large numbers of documents from elasticsearch
    Elasticsearch Platform — Find real-time answers at scale | Elastic

  • size: 1000
    the number of docs to return in each request. actually, with
    'scan', this means the number of docs to return from EACH shard
    in each request, eg 5 * 1000 = max of 5000 docs in each request

The above search request will return a scroll ID. You pass that scroll
ID to each subsequent "scroll" request /_search/scroll, until you get no
more hits.

The parameters are:

  • scroll: "1m"
    refresh the lock on the scroll snapshot and keep it in place for
    another one minutes
  • scroll_id: "xxxx"
    the scroll ID returned by the original search request, or by the
    previous scroll request. You MUST update this scroll ID to have
    the value of the previous request

Each time you call /_search/scroll you will get another batch of
documents.

You can reindex those (to a new index, or after making any changes)
using the "bulk" API.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Ivan,

yes! I meant update by query, thanks for the link!

Best,
Sylvain

On Thursday, 21 February 2013 23:47:20 UTC+1, Ivan Brusic wrote:

Do you mean update by query? If so, there is an open issue to support it:
https://github.com/elasticsearch/elasticsearch/issues/1607

If you really mean index by query, you would need to provide an example,
because I don't see how something would be returned by a query unless it
already existed.

Cheers,

Ivan

On Thu, Feb 21, 2013 at 2:37 PM, sbellem <sbe...@gmail.com <javascript:>>wrote:

Hi,

is it possible (in one step) to index documents that match the results
of a given query? That is, similar to the Delete by Query API, can one
index by query?

Thanks,
Sylvain

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Ivan,

yes! I meant update by query, thanks for the link!

Best,
Sylvain

On Thursday, 21 February 2013 23:47:20 UTC+1, Ivan Brusic wrote:

Do you mean update by query? If so, there is an open issue to support it:
https://github.com/elasticsearch/elasticsearch/issues/1607

If you really mean index by query, you would need to provide an example,
because I don't see how something would be returned by a query unless it
already existed.

Cheers,

Ivan

On Thu, Feb 21, 2013 at 2:37 PM, sbellem <sbe...@gmail.com <javascript:>>wrote:

Hi,

is it possible (in one step) to index documents that match the results
of a given query? That is, similar to the Delete by Query API, can one
index by query?

Thanks,
Sylvain

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Clint!

On Friday, 22 February 2013 11:43:05 UTC+1, Clinton Gormley wrote:

Hiya

is it possible (in one step) to index documents that match the
results of a given query? That is, similar to the Delete by Query API,
can one index by query?

This is something you would need to handle yourself. Depending on which
client API you use, you may already have some utilities available to
make it easy.

For instance, in the Perl API, you could do:

$search = $es->scrolled_search( 
    query => { ... whatever query ... }, 
    search_type => 'scan' 
); 

$es->reindex( source => $search, dest_index => 'new_index') 

If your API doesn't have anything similar, then you can write it
yourself. All it consists of is:

  1. a scrolled search (with scanning)
  2. bulk indexing

To implement a scrolled search, run an ordinary search request, with
these parameters:

  • scroll: "1m"
    this says that you want to take a "snapshot" of the current state
    of your data, and to keep that around for, eg, "1m"

  • search_type: "scan"
    this search type disables sorting and is very efficient for
    retrieving large numbers of documents from elasticsearch
    Elasticsearch Platform — Find real-time answers at scale | Elastic

  • size: 1000
    the number of docs to return in each request. actually, with
    'scan', this means the number of docs to return from EACH shard
    in each request, eg 5 * 1000 = max of 5000 docs in each request

The above search request will return a scroll ID. You pass that scroll
ID to each subsequent "scroll" request /_search/scroll, until you get no
more hits.

Elasticsearch Platform — Find real-time answers at scale | Elastic

The parameters are:

  • scroll: "1m"
    refresh the lock on the scroll snapshot and keep it in place for
    another one minutes
  • scroll_id: "xxxx"
    the scroll ID returned by the original search request, or by the
    previous scroll request. You MUST update this scroll ID to have
    the value of the previous request

Each time you call /_search/scroll you will get another batch of
documents.

You can reindex those (to a new index, or after making any changes)
using the "bulk" API.

Elasticsearch Platform — Find real-time answers at scale | Elastic

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.