Resume scroll-scan query?

I'm reindexing a ElasticSearch base with 50m docs using the scroll-scan
request to retrieve all docs, but my "reindexer" program stopped at 30m

Is there a way to redo the query to retrieve the left docs? Like using
offset?

Would the the internal order of the scan query be the same with a second
request?

I can assure that no new docs were indexed in the old index since the
beginning of the reindexing

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/69ec11c9-774e-42df-be57-fd870d347743%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

The scroll is available based on a timeout value you give it. Everytimetime
you scroll you restart the countdown.

You could track the last scroll id you used and try it again from there?

On Thursday, 23 October 2014 12:47:02 UTC-4, Roger de Cordova Farias wrote:

I'm reindexing a Elasticsearch base with 50m docs using the scroll-scan
request to retrieve all docs, but my "reindexer" program stopped at 30m

Is there a way to redo the query to retrieve the left docs? Like using
offset?

Would the the internal order of the scan query be the same with a second
request?

I can assure that no new docs were indexed in the old index since the
beginning of the reindexing

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hmm, I was using a small ttl, just enough to process each scroll call, but
I could try using a longer time to live and resuming from the last
scroll_id in case of error

That is a good idea, thanks

2014-10-23 17:12 GMT-02:00 John Smith java.dev.mtl@gmail.com:

The scroll is available based on a timeout value you give it.
Everytimetime you scroll you restart the countdown.

You could track the last scroll id you used and try it again from there?

On Thursday, 23 October 2014 12:47:02 UTC-4, Roger de Cordova Farias wrote:

I'm reindexing a Elasticsearch base with 50m docs using the scroll-scan
request to retrieve all docs, but my "reindexer" program stopped at 30m

Is there a way to redo the query to retrieve the left docs? Like using
offset?

Would the the internal order of the scan query be the same with a second
request?

I can assure that no new docs were indexed in the old index since the
beginning of the reindexing

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/NbshHCrBHoM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2532aT3fhY6axy%3DRwCG3Ukh9ivP1fmqoUs3pJa65e8oAs6A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Small ttl is ok (well adjusted properly for you process) because everytime
you call scroll it resets the ttl. So you don't need to put a 60m scroll
time. It just has to be long enough to be able to process the next scroll
id.

I'm curious if you can re-use the scroll id. It's not specifically
mentioned in the docs but i think scroll is forward only. So not sure once
you got once scroll id you can go back to it. I guess one way to find out :slight_smile:

On Thursday, 23 October 2014 15:44:04 UTC-4, Roger de Cordova Farias wrote:

Hmm, I was using a small ttl, just enough to process each scroll call, but
I could try using a longer time to live and resuming from the last
scroll_id in case of error

That is a good idea, thanks

2014-10-23 17:12 GMT-02:00 John Smith <java.d...@gmail.com <javascript:>>:

The scroll is available based on a timeout value you give it.
Everytimetime you scroll you restart the countdown.

You could track the last scroll id you used and try it again from there?

On Thursday, 23 October 2014 12:47:02 UTC-4, Roger de Cordova Farias
wrote:

I'm reindexing a Elasticsearch base with 50m docs using the scroll-scan
request to retrieve all docs, but my "reindexer" program stopped at 30m

Is there a way to redo the query to retrieve the left docs? Like using
offset?

Would the the internal order of the scan query be the same with a second
request?

I can assure that no new docs were indexed in the old index since the
beginning of the reindexing

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/NbshHCrBHoM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ec345d9e-19b4-4d2c-985a-fbf245e31a19%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I know it resets the ttl on each scroll call, but since I don't have an
automatic resuming process, I need to manually check the last scroll_id (I
will log it to a file) and restart the reindexing program using it. That is
why I need a longer ttl

I just tested the re-use of the scroll_id. Looks like after the first
request, the same scroll_id is returned over and over, returning new docs.

So I can't use this approach, since I will always lose the last batch after
resuming the reindexing

2014-10-23 18:20 GMT-02:00 John Smith java.dev.mtl@gmail.com:

Small ttl is ok (well adjusted properly for you process) because everytime
you call scroll it resets the ttl. So you don't need to put a 60m scroll
time. It just has to be long enough to be able to process the next scroll
id.

I'm curious if you can re-use the scroll id. It's not specifically
mentioned in the docs but i think scroll is forward only. So not sure once
you got once scroll id you can go back to it. I guess one way to find out :slight_smile:

On Thursday, 23 October 2014 15:44:04 UTC-4, Roger de Cordova Farias wrote:

Hmm, I was using a small ttl, just enough to process each scroll call,
but I could try using a longer time to live and resuming from the last
scroll_id in case of error

That is a good idea, thanks

2014-10-23 17:12 GMT-02:00 John Smith java.d...@gmail.com:

The scroll is available based on a timeout value you give it.
Everytimetime you scroll you restart the countdown.

You could track the last scroll id you used and try it again from there?

On Thursday, 23 October 2014 12:47:02 UTC-4, Roger de Cordova Farias
wrote:

I'm reindexing a Elasticsearch base with 50m docs using the scroll-scan
request to retrieve all docs, but my "reindexer" program stopped at 30m

Is there a way to redo the query to retrieve the left docs? Like using
offset?

Would the the internal order of the scan query be the same with a
second request?

I can assure that no new docs were indexed in the old index since the
beginning of the reindexing

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/NbshHCrBHoM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/NbshHCrBHoM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ec345d9e-19b4-4d2c-985a-fbf245e31a19%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ec345d9e-19b4-4d2c-985a-fbf245e31a19%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJp2532ZuYCP6%3DxqJeUmZGowAo9dzY%2BQGZkHKbfkTyCWxODF5w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.