just wondering if the scan search type is supposed to get slower when
reading like 1000 times 5000 records, as this is what i'm seeing.
The time needed to get the next resultset roughly doubles after every
second resultset (and of course reaches the timeout before i get all
documents).
I'm running ES 0.18.4 on a single node, 2 shards, no replicas, with
around 1000 types inside a single index (each with 40 to 60 fields),
10mio rows totalling in 40gb, while scanning by type.
I suspect it's my own fault, but a short yes/no (or some pointer)
would help, thanks.
just wondering if the scan search type is supposed to get slower when
reading like 1000 times 5000 records, as this is what i'm seeing.
The time needed to get the next resultset roughly doubles after every
second resultset (and of course reaches the timeout before i get all
documents).
I'm running ES 0.18.4 on a single node, 2 shards, no replicas, with
around 1000 types inside a single index (each with 40 to 60 fields),
10mio rows totalling in 40gb, while scanning by type.
I suspect it's my own fault, but a short yes/no (or some pointer)
would help, thanks.
just wondering if the scan search type is supposed to get
slower when
reading like 1000 times 5000 records, as this is what i'm
seeing.
The time needed to get the next resultset roughly doubles
after every
second resultset (and of course reaches the timeout before i
get all
documents).
I'm running ES 0.18.4 on a single node, 2 shards, no replicas,
with
around 1000 types inside a single index (each with 40 to 60
fields),
10mio rows totalling in 40gb, while scanning by type.
I suspect it's my own fault, but a short yes/no (or some
pointer)
would help, thanks.
Greets, Chris
just wondering if the scan search type is supposed to get
slower when
reading like 1000 times 5000 records, as this is what i'm
seeing.
The time needed to get the next resultset roughly doubles
after every
second resultset (and of course reaches the timeout before i
get all
documents).
I'm running ES 0.18.4 on a single node, 2 shards, no replicas,
with
around 1000 types inside a single index (each with 40 to 60
fields),
10mio rows totalling in 40gb, while scanning by type.
I suspect it's my own fault, but a short yes/no (or some
pointer)
would help, thanks.
Greets, Chris
It is efficient, certainly compared to when you do sorting, but, there is
still an overhead as you scroll "deeper".
Yes, although I didn't have that feeling in my case with some million
documents.
Its not that bad, it simply does an early exit during the collection part
once enough docs have been "collected". A regular search always goes
through all of them (to sort things properly).
Peter.
BTW: there is a new search option available in the upcoming lucene.
You mean the searchAfter one? It does something similar.
this is not correct. searchAfter is not an optimization, and it doesn't
early exit. it uses a fixed size priority queue and because of this, the
100 millionth page takes the same time as the first.
but you must pass the last result (bottom result from the previous page) so
that it knows which entries are 'too competitive' to enter the pq.
On Wed, Nov 23, 2011 at 11:22 PM, Robert Muir rcmuir@gmail.com wrote:
this is not correct. searchAfter is not an optimization, and it doesn't
early exit. it uses a fixed size priority queue and because of this, the
100 millionth page takes the same time as the first.
but you must pass the last result (bottom result from the previous page)
so that it knows which entries are 'too competitive' to enter the pq.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.