Scan over 1Mio records get's slower and slower

Chris_3 · November 21, 2011, 4:06pm

Hi list,

just wondering if the scan search type is supposed to get slower when
reading like 1000 times 5000 records, as this is what i'm seeing.
The time needed to get the next resultset roughly doubles after every
second resultset (and of course reaches the timeout before i get all
documents).

I'm running ES 0.18.4 on a single node, 2 shards, no replicas, with
around 1000 types inside a single index (each with 40 to 60 fields),
10mio rows totalling in 40gb, while scanning by type.

I suspect it's my own fault, but a short yes/no (or some pointer)
would help, thanks.

Greets, Chris

kimchy · November 21, 2011, 6:42pm

Its not your fault, it will take longer as you scan further into the
resultset.

On Mon, Nov 21, 2011 at 6:06 PM, Chris pc@matt-schwarz.com wrote:

Hi list,

just wondering if the scan search type is supposed to get slower when
reading like 1000 times 5000 records, as this is what i'm seeing.
The time needed to get the next resultset roughly doubles after every
second resultset (and of course reaches the timeout before i get all
documents).

I'm running ES 0.18.4 on a single node, 2 shards, no replicas, with
around 1000 types inside a single index (each with 40 to 60 fields),
10mio rows totalling in 40gb, while scanning by type.

I suspect it's my own fault, but a short yes/no (or some pointer)
would help, thanks.

Greets, Chris

Clinton_Gormley · November 22, 2011, 10:47am

On Mon, 2011-11-21 at 20:42 +0200, Shay Banon wrote:

Its not your fault, it will take longer as you scan further into the
resultset.

For a scrolled 'scan' search? I thought the point of a scan (ie not
being sorted) was that it was an efficient way to retrieve all docs?

clint

On Mon, Nov 21, 2011 at 6:06 PM, Chris pc@matt-schwarz.com wrote:
Hi list,

    just wondering if the scan search type is supposed to get
    slower when
    reading like 1000 times 5000 records, as this is what i'm
    seeing.
    The time needed to get the next resultset roughly doubles
    after every
    second resultset (and of course reaches the timeout before i
    get all
    documents).
    
    I'm running ES 0.18.4 on a single node, 2 shards, no replicas,
    with
    around 1000 types inside a single index (each with 40 to 60
    fields),
    10mio rows totalling in 40gb, while scanning by type.
    
    I suspect it's my own fault, but a short yes/no (or some
    pointer)
    would help, thanks.
    
    Greets, Chris

kimchy · November 22, 2011, 12:12pm

It is efficient, certainly compared to when you do sorting, but, there is
still an overhead as you scroll "deeper".

On Tue, Nov 22, 2011 at 12:47 PM, Clinton Gormley clint@traveljury.comwrote:

On Mon, 2011-11-21 at 20:42 +0200, Shay Banon wrote:

Its not your fault, it will take longer as you scan further into the
resultset.

For a scrolled 'scan' search? I thought the point of a scan (ie not
being sorted) was that it was an efficient way to retrieve all docs?

clint
On Mon, Nov 21, 2011 at 6:06 PM, Chris pc@matt-schwarz.com wrote:
Hi list,
    just wondering if the scan search type is supposed to get
    slower when
    reading like 1000 times 5000 records, as this is what i'm
    seeing.
    The time needed to get the next resultset roughly doubles
    after every
    second resultset (and of course reaches the timeout before i
    get all
    documents).

    I'm running ES 0.18.4 on a single node, 2 shards, no replicas,
    with
    around 1000 types inside a single index (each with 40 to 60
    fields),
    10mio rows totalling in 40gb, while scanning by type.

    I suspect it's my own fault, but a short yes/no (or some
    pointer)
    would help, thanks.

    Greets, Chris

Karussell1 · November 23, 2011, 8:04am

On 22 Nov., 13:12, Shay Banon kim...@gmail.com wrote:

It is efficient, certainly compared to when you do sorting, but, there is
still an overhead as you scroll "deeper".

Yes, although I didn't have that feeling in my case with some million
documents.

Peter.

BTW: there is a new search option available in the upcoming lucene.

kimchy · November 23, 2011, 10:13am

On Wed, Nov 23, 2011 at 10:04 AM, Karussell tableyourtime@googlemail.comwrote:

On 22 Nov., 13:12, Shay Banon kim...@gmail.com wrote:

It is efficient, certainly compared to when you do sorting, but, there is
still an overhead as you scroll "deeper".

Yes, although I didn't have that feeling in my case with some million
documents.

Its not that bad, it simply does an early exit during the collection part
once enough docs have been "collected". A regular search always goes
through all of them (to sort things properly).

Peter.

BTW: there is a new search option available in the upcoming lucene.

You mean the searchAfter one? It does something similar.

rmuir · November 23, 2011, 9:22pm

this is not correct. searchAfter is not an optimization, and it doesn't
early exit. it uses a fixed size priority queue and because of this, the
100 millionth page takes the same time as the first.

but you must pass the last result (bottom result from the previous page) so
that it knows which entries are 'too competitive' to enter the pq.

On Nov 23, 2011 5:13 AM, "Shay Banon" kimchy@gmail.com

You mean the searchAfter one? It does something similar.

kimchy · November 24, 2011, 2:04pm

I meant in terms of cost.

On Wed, Nov 23, 2011 at 11:22 PM, Robert Muir rcmuir@gmail.com wrote:

this is not correct. searchAfter is not an optimization, and it doesn't
early exit. it uses a fixed size priority queue and because of this, the
100 millionth page takes the same time as the first.

but you must pass the last result (bottom result from the previous page)
so that it knows which entries are 'too competitive' to enter the pq.

On Nov 23, 2011 5:13 AM, "Shay Banon" kimchy@gmail.com

You mean the searchAfter one? It does something similar.

Topic		Replies	Views
Scan search type returning fewer than expected records Elasticsearch	4	390	July 6, 2017
Scan/Scroll performance degrading logarithmically Elasticsearch	4	1306	July 5, 2017
SCAN Search type behavior explanation Elasticsearch	1	356	July 6, 2017
Just Pushed: Search Scan Type for effecient large hit set scanning Elasticsearch	14	452	July 6, 2017
_search/scroll?search_type=scan bugs/inconsistencies Elasticsearch	4	581	July 6, 2017

Scan over 1Mio records get's slower and slower

Related topics