Can we expand scroll request timeout (Java API)?

Akita · August 10, 2016, 11:41am

Hi!

Is it possible to expand the timeout of a scroll after it has been sent?
Basically telling ES: "You sent me some results in the last scroll, but I'm late for processing it. Can you keep the scroll context alive a little longer (than what I previously ask you to)?".

--

Here is me use case context:
I'm using the Java API to retrieve whole indices thanks to the scroll process.
For each doc, I have a process that is usually very short but can be (very) very long depending on docs.
So it makes it very difficult to anticipate the time needed to process a scroll and set a proper Timeout.

Sometime I need a huuuuge timeout to process the scroll (but I don't know it in advance), and I can't afford to set a big timeout for every scroll.

Since we need to use the last scroll_id for every request, I wrongly assumed that there were all different. So calling again the same id when I was running out of time to "retrieve the same documents and reset the timeout" would work . But unfortunately, making the same call twice do not retrieve the same result.

So me question is: (how) can I keep scroll context alive dynamically?

`response= Client.prepareSearchScroll(previousScrollId).setScroll(scrollTimeout).execute().actionget();
// Begin processing response
// See after few docs that we need more time
// run something like Client.keepScrollContextAlive(previousScrollId, newTimeout).execute().actionGet();
// Finish processing the scroll
// Call the next scroll

`

Thanks

dadoonet · August 10, 2016, 12:14pm

But you could do it the other way.

Use a very looong time out and then explicitly call clear scroll if you have any exception?

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html#_clear_scroll_api

Akita · August 10, 2016, 12:51pm

Yes, that might solve my problem!

But what happens if the scroll_id stays the same?

I will call the scroll "blabla1", which will return some docs and the next id which will also be blabla1.
If I call clear scroll, I won't be able to get next the next batch of docs, right?

And if I wait for the id to change before clearing the scroll (let's say after 11 scrolls), will ES keep all the 10 scrolls in memory while I'm processing the last?

(That's way better than what I had, anyway.)

warkolm · August 10, 2016, 8:48pm

The next scroll ID always changes after the current one is requested.

Akita · August 11, 2016, 8:17am

In my own experience, my scroll Id do not always change, especially during the first calls (but calling them again following the "call the last" rule do parse the next scroll). I ran SHA1 on them and even char-to-char comparison to be sure that they were identical and not just alike.
Mooky (Scroll Questions , question 4) and Sujoysett (Do unique/reusable _scroll_ids exist?) seem to have experienced the same situation.

I know the ID is related to shards and/or their state, but I don't know the details. But in my case, it doesn't always change.

Example below:
1)I run the first request, asking for a new scroll context.
Result on the right panel: Some docs and an ID.

>

Then I copy/paste the ID in the body of the second request, and execute.
I got some different doc, but same id. (Verified "manually", on https://www.diffnow.com/, and same sha1)

SS2.png1627×417 38.5 KB

So my questions remain :).

Topic		Replies	Views
Scroll java API timeout setting seemingly no effact Elasticsearch	3	1036	July 5, 2017
Alternative search using “Scroll” API and “Search After” API for real-time queries Elasticsearch	5	1670	September 19, 2019
ScrollId Timeout Elasticsearch	12	1228	July 6, 2017
Optimised Keep Alive Time for Scroll API Elasticsearch	5	1463	May 7, 2020
Scroll api：How to understand “Keeping the search context alive”？ Elasticsearch	5	838	August 30, 2018

Can we expand scroll request timeout (Java API)?

Related topics