Scrolling performance

Josh_Holtzman · November 23, 2011, 9:47pm

I'm trying to implement an index exporter by doing a "match_all" query
and scrolling through the entire index, 100 or 1000 documents at a
time. I'm seeing a significant slowdown in scrolling over time. The
first scroll via the rest api returns in < 50ms, but once I've
scrolled through 1.5 million of the 2 million total docs, the time to
execute is > 1 second. I've set the scroll timeout to 10 seconds,
which performs better than 10 minutes, but I can't decrease the
timeout much more without risking timing out between calls.

I'm wondering if a) this dramatic slowdown is expected, and b) if
there's a better way to scroll through all documents quickly.

Thanks,
Josh

Josh_Holtzman · November 23, 2011, 9:49pm

Sorry, I missed this thread, which covers the same topic:
http://groups.google.com/group/elasticsearch/browse_thread/thread/667811af4a1f5fae

On Nov 23, 1:47 pm, jholtzman jmholtz...@gmail.com wrote:

I'm trying to implement an index exporter by doing a "match_all" query
and scrolling through the entire index, 100 or 1000 documents at a
time. I'm seeing a significant slowdown in scrolling over time. The
first scroll via the rest api returns in < 50ms, but once I've
scrolled through 1.5 million of the 2 million total docs, the time to
execute is > 1 second. I've set the scroll timeout to 10 seconds,
which performs better than 10 minutes, but I can't decrease the
timeout much more without risking timing out between calls.

I'm wondering if a) this dramatic slowdown is expected, and b) if
there's a better way to scroll through all documents quickly.

Thanks,
Josh

kimchy · November 24, 2011, 2:13pm

Setting the a lower scroll timeout will not affect the performance of
scrolling.
Strange that it ends up being 1 second for the 1.5M scroll on match_all,
how many shards do you have in the index? How many nodes in the cluster?

On Wed, Nov 23, 2011 at 11:49 PM, jholtzman jmholtzman@gmail.com wrote:

Sorry, I missed this thread, which covers the same topic:

http://groups.google.com/group/elasticsearch/browse_thread/thread/667811af4a1f5fae

On Nov 23, 1:47 pm, jholtzman jmholtz...@gmail.com wrote:

I'm trying to implement an index exporter by doing a "match_all" query
and scrolling through the entire index, 100 or 1000 documents at a
time. I'm seeing a significant slowdown in scrolling over time. The
first scroll via the rest api returns in < 50ms, but once I've
scrolled through 1.5 million of the 2 million total docs, the time to
execute is > 1 second. I've set the scroll timeout to 10 seconds,
which performs better than 10 minutes, but I can't decrease the
timeout much more without risking timing out between calls.

I'm wondering if a) this dramatic slowdown is expected, and b) if
there's a better way to scroll through all documents quickly.

Thanks,
Josh

Josh_Holtzman · November 30, 2011, 10:50pm

4 shards, all on a single node.

Thanks,
Josh

On Nov 24, 6:13 am, Shay Banon kim...@gmail.com wrote:

Setting the a lower scroll timeout will not affect the performance of
scrolling.

Strange that it ends up being 1 second for the 1.5M scroll on match_all,
how many shards do you have in the index? How many nodes in the cluster?

On Wed, Nov 23, 2011 at 11:49 PM, jholtzman jmholtz...@gmail.com wrote:

Sorry, I missed this thread, which covers the same topic:

http://groups.google.com/group/elasticsearch/browse_thread/thread/667...

On Nov 23, 1:47 pm, jholtzman jmholtz...@gmail.com wrote:

I'm trying to implement an index exporter by doing a "match_all" query
and scrolling through the entire index, 100 or 1000 documents at a
time. I'm seeing a significant slowdown in scrolling over time. The
first scroll via the rest api returns in < 50ms, but once I've
scrolled through 1.5 million of the 2 million total docs, the time to
execute is > 1 second. I've set the scroll timeout to 10 seconds,
which performs better than 10 minutes, but I can't decrease the
timeout much more without risking timing out between calls.

I'm wondering if a) this dramatic slowdown is expected, and b) if
there's a better way to scroll through all documents quickly.

Thanks,
Josh

Josh_Holtzman · December 16, 2011, 12:24am

The documentation at Elasticsearch Platform — Find real-time answers at scale | Elastic
describes scrolling through a large set of data using this URL as an
example: http://localhost:9200/twitter/tweet/_search?scroll=5m

This unfortunately excludes the key parameter: search_type=scan.
Following the documentation at Elasticsearch Platform — Find real-time answers at scale | Elastic
did the trick, and now the scrolling performance is once again
constant across requests.

Thanks,
Josh

On Nov 30, 2:50 pm, jholtzman jmholtz...@gmail.com wrote:

4 shards, all on a single node.

Thanks,
Josh

On Nov 24, 6:13 am, Shay Banon kim...@gmail.com wrote:

Setting the a lower scroll timeout will not affect the performance of
scrolling.

Strange that it ends up being 1 second for the 1.5M scroll on match_all,
how many shards do you have in the index? How many nodes in the cluster?

On Wed, Nov 23, 2011 at 11:49 PM,jholtzmanjmholtz...@gmail.com wrote:

Sorry, I missed this thread, which covers the same topic:

http://groups.google.com/group/elasticsearch/browse_thread/thread/667...

On Nov 23, 1:47 pm,jholtzmanjmholtz...@gmail.com wrote:

I'm trying to implement an index exporter by doing a "match_all" query
and scrolling through the entire index, 100 or 1000 documents at a
time. I'm seeing a significant slowdown in scrolling over time. The
first scroll via the rest api returns in < 50ms, but once I've
scrolled through 1.5 million of the 2 million total docs, the time to
execute is > 1 second. I've set the scroll timeout to 10 seconds,
which performs better than 10 minutes, but I can't decrease the
timeout much more without risking timing out between calls.

I'm wondering if a) this dramatic slowdown is expected, and b) if
there's a better way to scroll through all documents quickly.

Thanks,
Josh

Topic		Replies	Views
ScrollAll Timeout Elasticsearch	7	1236	September 27, 2019
Scroll query performance regression upgrading to ES v7.9 from v6.8 Elasticsearch	1	365	October 12, 2020
Optimised Keep Alive Time for Scroll API Elasticsearch	5	1460	May 7, 2020
How to improve Scroll runtime for 5 billion record retrieval? Elasticsearch	3	424	May 11, 2020
Scroll starts fast but finishes slow Elasticsearch	3	430	July 5, 2017

Scrolling performance

Related topics