Exporting sub set of data from main index to a new index

Vijay_Tiwary · October 15, 2014, 5:14pm

Is there a easy way to export a some part of data(based on some filters)
from a index say for e.g from a master index to a new index. Apparently it
looks like I will have to use bulk API to query the data from the master
index (using some filters) and then I will have insert those documents
into the new index. Is there any better and easier way.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f3778121-8019-44ba-a30c-3194ae72f72f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alexandre_Rafalovitc · October 15, 2014, 6:22pm

Have you looked at Scroll and Scan?

This assumes your _source field has not been disabled.

Regards,
Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: Sign Up | LinkedIn

On 15 October 2014 13:14, Vijay Tiwary vijaykr.tiwary@gmail.com wrote:

Is there a easy way to export a some part of data(based on some filters)
from a index say for e.g from a master index to a new index. Apparently it
looks like I will have to use bulk API to query the data from the master
index (using some filters) and then I will have insert those documents into
the new index. Is there any better and easier way.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f3778121-8019-44ba-a30c-3194ae72f72f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEFAe-HhSy0eoUg6gYdNYir%2BaBCy1L7QPqOcqdmOQ9JU2VJVkQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Vijay_Tiwary · October 16, 2014, 12:36pm

Thanks Alex. I have done it like this:

while (true) {
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new
TimeValue(600000)).execute().actionGet();
SearchHits sh= scrollResp.getHits();
SearchHit searchHit = sh.getHits();

LOG.info("Hits :"+sh.getTotalHits()+", Docs fetched :"+searchHit.length);
if (scrollResp.getHits().getHits().length == 0) {
break;
}
}

So assuming I have only on shard and i am fetching total of 100,000
documents in steps of 10,000 documents then there will be 10 get calls and
this call will happen serially one after the other. Is there a mechanism by
any chance to execute this get calls in parallel.

On Wednesday, October 15, 2014 11:53:38 PM UTC+5:30, Alexandre Rafalovitch
wrote:

Have you looked at Scroll and Scan?

Elasticsearch Platform — Find real-time answers at scale | Elastic

This assumes your _source field has not been disabled.

Regards,
Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: Sign Up | LinkedIn

On 15 October 2014 13:14, Vijay Tiwary <vijaykr...@gmail.com <javascript:>>
wrote:

Is there a easy way to export a some part of data(based on some filters)
from a index say for e.g from a master index to a new index. Apparently
it
looks like I will have to use bulk API to query the data from the master
index (using some filters) and then I will have insert those documents
into
the new index. Is there any better and easier way.

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/f3778121-8019-44ba-a30c-3194ae72f72f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5bce72bf-3400-4963-9fa7-adf48d12383d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · October 16, 2014, 11:10pm

Scan/scroll, when performed over multiple nodes/shards, is inherently
executed in parallel.

Scan/scroll depends on the scroll ID chain, so you have to execute a
scan/scroll sequence serially. For a single shard you can add filters to
the query in order to partition the search hits, these queries can be
executed with several scan/scroll requests in parallel from your client.

Jörg

On Thu, Oct 16, 2014 at 2:36 PM, Vijay Tiwary vijaykr.tiwary@gmail.com
wrote:

Thanks Alex. I have done it like this:

while (true) {
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new
TimeValue(600000)).execute().actionGet();
SearchHits sh= scrollResp.getHits();
SearchHit searchHit = sh.getHits();

LOG.info("Hits :"+sh.getTotalHits()+", Docs fetched :"+searchHit.length);
if (scrollResp.getHits().getHits().length == 0) {
break;
}
}

So assuming I have only on shard and i am fetching total of 100,000
documents in steps of 10,000 documents then there will be 10 get calls and
this call will happen serially one after the other. Is there a mechanism by
any chance to execute this get calls in parallel.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE_xwffAzsiACRB374rn6GR3hN44rnJvHfa4GZiWMxryg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
The best way to migrate data from 3 indices to new index Elasticsearch	4	410	July 6, 2017
How to copy an index with 5 shards to an index with 1 shard Elasticsearch	4	327	July 6, 2017
"Main database" Elasticsearch	4	344	July 6, 2017
How to clone an index with all it's documents Elasticsearch	3	10005	July 6, 2017
Api to trunkate index Elasticsearch	3	326	July 6, 2017

Exporting sub set of data from main index to a new index

Related topics