Non-Paged Query And Performance

I understand that the default page size is 10 hits, which can be
changed via config or in a query by specifying "from" and "size". I
have a scenario where I want all hits. I can be reasonably confident
that the results set will not be too large. I tried setting size to
0, -1 and max-int (in C#). The first two returned no results. The
latter seems to work, functionally speaking, but it seems there is a
price to pay. My queries are timing out even though the number of
hits would be not too large (maybe 100-200).

I could set a high number but much lower than max int. But I wonder,
does performance always deteriorate with page size, even if there are
few hits? Is there a recommended way to approach this scenario?

Hey,

If you ask for N hits, a priority queue with that size (N) is built by Lucene in order to have a data structure that keeps the hits ordered. So, if you ask for something like MAX_INT-1, it is incredibly wasteful. So, you should ask for something like 500 hits (if you know you are going to have less).

The new "scan" search type in master can help in certain scenarios (it does no sorting) as it bypass the need to sort.

-shay.banon
On Tuesday, March 8, 2011 at 12:18 AM, Tim Scott wrote:
I understand that the default page size is 10 hits, which can be

changed via config or in a query by specifying "from" and "size". I
have a scenario where I want all hits. I can be reasonably confident
that the results set will not be too large. I tried setting size to
0, -1 and max-int (in C#). The first two returned no results. The
latter seems to work, functionally speaking, but it seems there is a
price to pay. My queries are timing out even though the number of
hits would be not too large (maybe 100-200).

I could set a high number but much lower than max int. But I wonder,
does performance always deteriorate with page size, even if there are
few hits? Is there a recommended way to approach this scenario?

You know what I really need...I only need the IDs of the documents
that match the query. But I need all of them. Any really efficient
way to do that?

Tim

On Mar 8, 12:45 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

If you ask for N hits, a priority queue with that size (N) is built by Lucene in order to have a data structure that keeps the hits ordered. So, if you ask for something like MAX_INT-1, it is incredibly wasteful. So, you should ask for something like 500 hits (if you know you are going to have less).

The new "scan" search type in master can help in certain scenarios (it does no sorting) as it bypass the need to sort.

-shay.banonOn Tuesday, March 8, 2011 at 12:18 AM, Tim Scott wrote:

I understand that the default page size is 10 hits, which can be

changed via config or in a query by specifying "from" and "size". I
have a scenario where I want all hits. I can be reasonably confident
that the results set will not be too large. I tried setting size to
0, -1 and max-int (in C#). The first two returned no results. The
latter seems to work, functionally speaking, but it seems there is a
price to pay. My queries are timing out even though the number of
hits would be not too large (maybe 100-200).

I could set a high number but much lower than max int. But I wonder,
does performance always deteriorate with page size, even if there are
few hits? Is there a recommended way to approach this scenario?

Getting only the ids matching is simple, by specifying an empty array of fields. Getting all of them require you to set a size (and make sure you use query_and_fetch search type).

It can be optimized, I guess, by introducing a new search_type that will return the full set of results, without any ordering. Its similar to the new scan type, though without the overhead of setting up scanning.

-shay.banon
On Tuesday, March 8, 2011 at 8:01 PM, Tim Scott wrote:

You know what I really need...I only need the IDs of the documents
that match the query. But I need all of them. Any really efficient
way to do that?

Tim

On Mar 8, 12:45 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hey,

If you ask for N hits, a priority queue with that size (N) is built by Lucene in order to have a data structure that keeps the hits ordered. So, if you ask for something like MAX_INT-1, it is incredibly wasteful. So, you should ask for something like 500 hits (if you know you are going to have less).

The new "scan" search type in master can help in certain scenarios (it does no sorting) as it bypass the need to sort.

-shay.banonOn Tuesday, March 8, 2011 at 12:18 AM, Tim Scott wrote:

I understand that the default page size is 10 hits, which can be

changed via config or in a query by specifying "from" and "size". I
have a scenario where I want all hits. I can be reasonably confident
that the results set will not be too large. I tried setting size to
0, -1 and max-int (in C#). The first two returned no results. The
latter seems to work, functionally speaking, but it seems there is a
price to pay. My queries are timing out even though the number of
hits would be not too large (maybe 100-200).

I could set a high number but much lower than max int. But I wonder,
does performance always deteriorate with page size, even if there are
few hits? Is there a recommended way to approach this scenario?