Is it possible to expand the timeout of a scroll after it has been sent?
Basically telling ES: "You sent me some results in the last scroll, but I'm late for processing it. Can you keep the scroll context alive a little longer (than what I previously ask you to)?".
Here is me use case context:
I'm using the Java API to retrieve whole indices thanks to the scroll process.
For each doc, I have a process that is usually very short but can be (very) very long depending on docs.
So it makes it very difficult to anticipate the time needed to process a scroll and set a proper Timeout.
Sometime I need a huuuuge timeout to process the scroll (but I don't know it in advance), and I can't afford to set a big timeout for every scroll.
Since we need to use the last scroll_id for every request, I wrongly assumed that there were all different. So calling again the same id when I was running out of time to "retrieve the same documents and reset the timeout" would work . But unfortunately, making the same call twice do not retrieve the same result.
So me question is: (how) can I keep scroll context alive dynamically?
// Begin processing response
// See after few docs that we need more time
// run something like Client.keepScrollContextAlive(previousScrollId, newTimeout).execute().actionGet();
// Finish processing the scroll
// Call the next scroll
But you could do it the other way.
Use a very looong time out and then explicitly call clear scroll if you have any exception?
Yes, that might solve my problem!
But what happens if the scroll_id stays the same?
I will call the scroll "blabla1", which will return some docs and the next id which will also be blabla1.
If I call clear scroll, I won't be able to get next the next batch of docs, right?
And if I wait for the id to change before clearing the scroll (let's say after 11 scrolls), will ES keep all the 10 scrolls in memory while I'm processing the last?
(That's way better than what I had, anyway.)
The next scroll ID always changes after the current one is requested.
In my own experience, my scroll Id do not always change, especially during the first calls (but calling them again following the "call the last" rule do parse the next scroll). I ran SHA1 on them and even char-to-char comparison to be sure that they were identical and not just alike.
Mooky (Scroll Questions , question 4) and Sujoysett (Do unique/reusable _scroll_ids exist?) seem to have experienced the same situation.
I know the ID is related to shards and/or their state, but I don't know the details. But in my case, it doesn't always change.
1)I run the first request, asking for a new scroll context.
Result on the right panel: Some docs and an ID.
- Then I copy/paste the ID in the body of the second request, and execute.
I got some different doc, but same id. (Verified "manually", on https://www.diffnow.com/, and same sha1)
So my questions remain :).