Limit scroll result set by size in bytes instead of documents?

John_Freeman · November 26, 2013, 8:06pm

I'm trying to reindex documents of widely variable size (from a few bytes
to a few outliers over 200 MB). I want to keep my scroll size as high as
possible to maximize throughput, but I need to keep the returned result set
under a memory threshold, so when those outliers show up I want the scroll
size temporarily tapered. Is that possible? Could it be implemented in the
future? I discovered that a scroll cannot be "retried" if it comes back too
large - once some documents are returned, they can never be returned again
for the same scroll - but in a 2011 comment Shay alluded it might be doable:

http://elasticsearch-users.115913.n3.nabble.com/a-couple-of-exceptions-from-stress-testing-scrolling-tp3580933p3586935.html

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/19b14ae0-4caf-46b3-a804-8b8634cea055%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · November 26, 2013, 10:09pm

Shay also notes it opens a can of problems. Each node would have keep an
extended state of the scroll, with two continuations, the current one for
replay, and the next one. Also it would mean that if only a single node
can't respond successfully (for whatever reason), all other nodes would
have to replay old responses. During replay, other nodes could then also
return failed scroll responses, and the whole scan/scroll could enter a
loop if a client tries again and again.

With the current scan/scroll, each node can release the allocated scroll
response resources immediately after returning them to the client, and can
happily continue.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH5PBGA%2BoPrdvoseJDO_63%2BRosAsYdENQmycChz7wvzTw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

John_Freeman · November 26, 2013, 10:32pm

That wasn't my question though, just a note. Maybe I shouldn't have
included it if it was going to distract. I'm asking if there is a way to
set size on scroll in bytes instead of documents.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5825b16f-4b62-4c60-bf1d-4c1d44801432%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · November 26, 2013, 11:43pm

Sure, you could write an alternative implementation
of org.elasticsearch.search.fetch.FetchSearchResult
that stops fetching search results if they exceed a limit.
Because ES shards do not know about the byte size of the final result the
client sees, you would have to declare an internal estimated byte limit per
shard.
There is an edge case where bytes instead of docs doesn't help much, since
even a single doc could take gigabytes.
All in all, I am not sure how much the benefit is, compared to a
scan/scroll over huge docs with setting the scroll size to the minimum of 1.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF_RmKqxW9vkcX7PS8c889iTif2KfZkaNBpUZvKYTuJiA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Using scroll and different results sizes Elasticsearch	1	386	July 6, 2017
Restrict query result size when using scan and scroll API Elasticsearch	1	472	July 6, 2017
SearchType SCAN and Size Elasticsearch	2	345	July 6, 2017
Scroll vs Search API Elasticsearch	7	11094	July 5, 2017
Incomplete results for scan / scroll searches Elasticsearch	3	753	July 6, 2017

Limit scroll result set by size in bytes instead of documents?

Related topics