Scrolling, which keeps resources on the nodes around for a time limit
specified in the scroll request, has no ability for the clinte to
close(), or 'release' the resources once finished, as I see it from
the high level. Subtle reading of the docs and code seem to indicate
though that if a Scrolling client iterates over the entire collection
of results, that once that loop exits, the resources are automatically
closed for the scrol.
Is that true?
That is correct. The "view" on search when you start your scroll request
is maintained until either (1) the scroll request finishes or (2) or the
scroll times out.
Note: on the initial scroll request and on EACH SUBSEQUENT pull of
scroll results, you pass a scroll=TIME parameter. This means that the
timeout should be sufficient to process the results from a single pull,
not all the results in ES. Every time you pull another set of results
from the scroll, it extends the timeout to now() + timeout.
So lets say that you are sure that you can finish parsing the first X
results in 20 seconds, set your scroll timeout to eg '30s'. Every time
you pull the next batch of results, you pass scroll=30s which ensures
that the scroll stays live for another 30s from now.
We have a case where an API style request in our application needs to
return all results, and we'd like to use ES to perform the search
interally, and be able to scroll through all the results to satisfy
the API answer, but the lack of clarity of closing/freeing up the
resources make this unclear if it's a good idea or not.
Scrolling can be very expensive. To return all the results in ES, it is
better to combine it with search_type=scan, which is very efficient.
The downside being that you can't sort with scan requests.