Anomaly Job scroll_size parameter behaviour

marmai16 · December 4, 2023, 9:10am

Hello everyone,

a quick question regarding the scroll_size parameter of a datafeed in an anomaly job.

Is the scroll_size just limiting the number of results per query returned, but every document is processed (thus a reduced scroll_size implies increased number of query executions)?

Or, let's say if we have 1000 doc's in a time range equal the bucket size of a job, and the scroll_size is defined as 750, does it only process 750 documents and proceeds to the next bucket (without processing the 250 remaining documents in that bucket).

Thank you in advance!

marmai16 · December 4, 2023, 2:07pm

To answer my own question, for those interested:

" scroll_size: In most cases, the type of search that the datafeed executes to Elasticsearch uses the scroll API. Scroll size defines how much the datafeed queries to Elasticsearch at a time. For example, if the datafeed is set to query for log data every 5 minutes, but in a typical 5-minute window there are 1 million events, the idea of scrolling that data means that not all 1 million events will be expected to be fetched with one giant query. Rather, it will do it with many queries in increments of scroll_size. By default, this scroll size is set conservatively to 1,000. So, to get 1 million records returned to ML, the datafeed will ask Elasticsearch for 1,000 rows, a thousand times. Increasing scroll_size to 10,000 will make the number of scrolls be reduced to a hundred. In general, beefier clusters should be able to handle a larger scroll_size and thus be more efficient in the overall process."

Source: Machine Learning with the Elastic Stack

system · January 1, 2024, 2:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Adjust scroll size dynamically Elasticsearch	1	529	August 31, 2018
Scan/scroll - optimal "size" parameter Elasticsearch	2	1117	July 5, 2017
Scrolling vs Sizing vs Batching Logstash	4	1023	February 22, 2018
Using scroll and different results sizes Elasticsearch	1	382	July 6, 2017
Setsize for java api elasticsearch Elasticsearch	3	6062	July 5, 2017

Anomaly Job scroll_size parameter behaviour

Related topics