Search Performance

Hi,
I am having a question about the performance of elasticsearch
returning larger result sets.
I am searching across 32 shards and I am getting 350000 results for a
simple query.

I dont load any additional data with the search request. If I take 10
results at once, the query takes 100ms, if
I take 1000 results, the query takes 5seconds. I was assuming that
aside from the merge operation, the
operation would need the same ressources, what I don't really
understand is why the query takes 50x longer
to execute. The network load is very low and I am using a
query-then-fetch search type.

Best,
Michel

How large are the documents the you fetch in the search? You can try to fetch them with no fields (set the fields to an empty array) and see if it helps? Which search type are you using?

On Tuesday, June 28, 2011 at 2:20 PM, Michel Conrad wrote:

Hi,
I am having a question about the performance of elasticsearch
returning larger result sets.
I am searching across 32 shards and I am getting 350000 results for a
simple query.

I dont load any additional data with the search request. If I take 10
results at once, the query takes 100ms, if
I take 1000 results, the query takes 5seconds. I was assuming that
aside from the merge operation, the
operation would need the same ressources, what I don't really
understand is why the query takes 50x longer
to execute. The network load is very low and I am using a
query-then-fetch search type.

Best,
Michel

Hi, I am using a query-then-fetch search type. As a client I am using
a transport client connected to about 15 nodes.
It makes no difference which fields I load. Even if I load only the id
field, the search gets slow when fetching many
results. I don't store the source in the index. The indexed data I
store for every doc is about 2-3 KB.

On Tue, Jun 28, 2011 at 9:51 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

How large are the documents the you fetch in the search? You can try to
fetch them with no fields (set the fields to an empty array) and see if it
helps? Which search type are you using?

On Tuesday, June 28, 2011 at 2:20 PM, Michel Conrad wrote:

Hi,
I am having a question about the performance of elasticsearch
returning larger result sets.
I am searching across 32 shards and I am getting 350000 results for a
simple query.

I dont load any additional data with the search request. If I take 10
results at once, the query takes 100ms, if
I take 1000 results, the query takes 5seconds. I was assuming that
aside from the merge operation, the
operation would need the same ressources, what I don't really
understand is why the query takes 50x longer
to execute. The network load is very low and I am using a
query-then-fetch search type.

Best,
Michel

Do you do anything else in the search that brings the cost up when fetching more docs? Like highlighting.

On Wednesday, June 29, 2011 at 12:56 PM, Michel Conrad wrote:

Hi, I am using a query-then-fetch search type. As a client I am using
a transport client connected to about 15 nodes.
It makes no difference which fields I load. Even if I load only the id
field, the search gets slow when fetching many
results. I don't store the source in the index. The indexed data I
store for every doc is about 2-3 KB.

On Tue, Jun 28, 2011 at 9:51 PM, Shay Banon
<shay.banon@elasticsearch.com (mailto:shay.banon@elasticsearch.com)> wrote:

How large are the documents the you fetch in the search? You can try to
fetch them with no fields (set the fields to an empty array) and see if it
helps? Which search type are you using?

On Tuesday, June 28, 2011 at 2:20 PM, Michel Conrad wrote:

Hi,
I am having a question about the performance of elasticsearch
returning larger result sets.
I am searching across 32 shards and I am getting 350000 results for a
simple query.

I dont load any additional data with the search request. If I take 10
results at once, the query takes 100ms, if
I take 1000 results, the query takes 5seconds. I was assuming that
aside from the merge operation, the
operation would need the same ressources, what I don't really
understand is why the query takes 50x longer
to execute. The network load is very low and I am using a
query-then-fetch search type.

Best,
Michel

I further investigated the issue and it occurs not at the beginning of
the paging, but only after I set from to 35000 or something. Even if I
dont really understand why, could it be that the merging takes much
longer if there are more results to return?

On Wed, Jun 29, 2011 at 2:01 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

Do you do anything else in the search that brings the cost up when fetching
more docs? Like highlighting.

On Wednesday, June 29, 2011 at 12:56 PM, Michel Conrad wrote:

Hi, I am using a query-then-fetch search type. As a client I am using
a transport client connected to about 15 nodes.
It makes no difference which fields I load. Even if I load only the id
field, the search gets slow when fetching many
results. I don't store the source in the index. The indexed data I
store for every doc is about 2-3 KB.

On Tue, Jun 28, 2011 at 9:51 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

How large are the documents the you fetch in the search? You can try to
fetch them with no fields (set the fields to an empty array) and see if it
helps? Which search type are you using?

On Tuesday, June 28, 2011 at 2:20 PM, Michel Conrad wrote:

Hi,
I am having a question about the performance of elasticsearch
returning larger result sets.
I am searching across 32 shards and I am getting 350000 results for a
simple query.

I dont load any additional data with the search request. If I take 10
results at once, the query takes 100ms, if
I take 1000 results, the query takes 5seconds. I was assuming that
aside from the merge operation, the
operation would need the same ressources, what I don't really
understand is why the query takes 50x longer
to execute. The network load is very low and I am using a
query-then-fetch search type.

Best,
Michel

Merging will take longer, and also, each shard will need to aggregate that many hits (to order them correctly) and then send them back (which is the more expensive aspect).

On Thursday, June 30, 2011 at 7:18 PM, Michel Conrad wrote:

I further investigated the issue and it occurs not at the beginning of
the paging, but only after I set from to 35000 or something. Even if I
dont really understand why, could it be that the merging takes much
longer if there are more results to return?

On Wed, Jun 29, 2011 at 2:01 PM, Shay Banon
<shay.banon@elasticsearch.com (mailto:shay.banon@elasticsearch.com)> wrote:

Do you do anything else in the search that brings the cost up when fetching
more docs? Like highlighting.

On Wednesday, June 29, 2011 at 12:56 PM, Michel Conrad wrote:

Hi, I am using a query-then-fetch search type. As a client I am using
a transport client connected to about 15 nodes.
It makes no difference which fields I load. Even if I load only the id
field, the search gets slow when fetching many
results. I don't store the source in the index. The indexed data I
store for every doc is about 2-3 KB.

On Tue, Jun 28, 2011 at 9:51 PM, Shay Banon
<shay.banon@elasticsearch.com (mailto:shay.banon@elasticsearch.com)> wrote:

How large are the documents the you fetch in the search? You can try to
fetch them with no fields (set the fields to an empty array) and see if it
helps? Which search type are you using?

On Tuesday, June 28, 2011 at 2:20 PM, Michel Conrad wrote:

Hi,
I am having a question about the performance of elasticsearch
returning larger result sets.
I am searching across 32 shards and I am getting 350000 results for a
simple query.

I dont load any additional data with the search request. If I take 10
results at once, the query takes 100ms, if
I take 1000 results, the query takes 5seconds. I was assuming that
aside from the merge operation, the
operation would need the same ressources, what I don't really
understand is why the query takes 50x longer
to execute. The network load is very low and I am using a
query-then-fetch search type.

Best,
Michel

On Thu, 2011-06-30 at 18:18 +0200, Michel Conrad wrote:

I further investigated the issue and it occurs not at the beginning of
the paging, but only after I set from to 35000 or something. Even if I
dont really understand why, could it be that the merging takes much
longer if there are more results to return?

I presume you're using the default of 5 shards?

If you want to start from document 35,000, then each shard needs to find
the 35,000 best results local to that shard, then 5 x 35,000 records get
returned to the requesting node, which chooses the best 35,000 of those.

You don't want to do this :slight_smile: It's the same reason google doesn't give
you more than 1,000 results for any search.

The only efficient way to get this many results out is to use a scrolled
search with search_type set to 'scan', but the results are not sorted.

clint

Hi,

Thanks for the clarifications. I think I found the primary issue of
the problem. The behaviour of elastsearch seems
correct to me. Now the searchtime is linear to the start of the
returned slice. The merging accross 20 shards only
takes 2 times as search in a single shard, which seems acceptable to
me. The problem seems that there is an issue
with different nodes in my cluster. I now switched to only connect to
a single node using the TransportClient. With some
nodes I am getting correct results as mentioned above. The problem is
when I connect to some other nodes the response
time gets bigger and fluctuates, therefore the results I posted in the
beginning of this thread were incorrect.

I pasted some results further down, in the first column is the from
setting, in the second the response time I get from server A, in the
third column the respone time I get from server B. Server B behaves
corrently, and returns in short timeframes consistently.

Server A has issues generating the response, on average 16x longer.
I query both servers in exactly the same way and they are getting data
from 19 shards.

My configuration is the following:

  • I am using 0.16.3-SNAPSHOT (revision 08648ec7)
  • maximum heapsize is 4GB
  • threadpool configuration is on default
  • swap is disabled

I already checked the following:

  • uptime: the load is under 0.5 on every server
  • top: iowait is 0%
  • restarting both node changes nothing
  • the cpu load stays low on both servers

The issue seems similar to an issue I had with locally threading many
searchrequests where they were getting timeouts. Is it possible that
some synchronisation code is blocking? It seems to me that the
inconsistent answer times when there is no load on the server might be
some kind of concurrency problem.

Best,
Michel

0 260 66
1000 436 66
2000 745 93
3000 1077 74
4000 1009 84
5000 4319 100
6000 1965 160
7000 1724 130
8000 1538 99
9000 1508 306
10000 1946 107
11000 1818 265
12000 1757 122
13000 1548 127
14000 2374 127
15000 3241 132
16000 3433 139
17000 2255 311
18000 2735 148
19000 4503 171
20000 2439 160
21000 2695 163
22000 2500 155
23000 2752 168
24000 2637 188
25000 4183 179
26000 3690 176
27000 3483 187
28000 3691 187
29000 3917 342
30000 5990 199
31000 3925 223
32000 3897 208
33000 4552 206
34000 4243 230
35000 3525 217
36000 3028 210
37000 3344 242
38000 4006 241
39000 3515 227
40000 4751 287
41000 4164 279
42000 3167 279
43000 5387 285

On Thu, Jun 30, 2011 at 6:23 PM, Clinton Gormley
clinton@iannounce.co.uk wrote:

On Thu, 2011-06-30 at 18:18 +0200, Michel Conrad wrote:

I further investigated the issue and it occurs not at the beginning of
the paging, but only after I set from to 35000 or something. Even if I
dont really understand why, could it be that the merging takes much
longer if there are more results to return?

I presume you're using the default of 5 shards?

If you want to start from document 35,000, then each shard needs to find
the 35,000 best results local to that shard, then 5 x 35,000 records get
returned to the requesting node, which chooses the best 35,000 of those.

You don't want to do this :slight_smile: It's the same reason google doesn't give
you more than 1,000 results for any search.

The only efficient way to get this many results out is to use a scrolled
search with search_type set to 'scan', but the results are not sorted.

clint

Maybe some nodes are misbehaving? A note on the TransportClient, when you send a search request, it send it to a node (round robin between the nodes it has, depends on sniffing as well), and that node will execute the distributed search across the cluster.

On Friday, July 1, 2011 at 5:46 PM, Michel Conrad wrote:

Hi,

Thanks for the clarifications. I think I found the primary issue of
the problem. The behaviour of elastsearch seems
correct to me. Now the searchtime is linear to the start of the
returned slice. The merging accross 20 shards only
takes 2 times as search in a single shard, which seems acceptable to
me. The problem seems that there is an issue
with different nodes in my cluster. I now switched to only connect to
a single node using the TransportClient. With some
nodes I am getting correct results as mentioned above. The problem is
when I connect to some other nodes the response
time gets bigger and fluctuates, therefore the results I posted in the
beginning of this thread were incorrect.

I pasted some results further down, in the first column is the from
setting, in the second the response time I get from server A, in the
third column the respone time I get from server B. Server B behaves
corrently, and returns in short timeframes consistently.

Server A has issues generating the response, on average 16x longer.
I query both servers in exactly the same way and they are getting data
from 19 shards.

My configuration is the following:

  • I am using 0.16.3-SNAPSHOT (revision 08648ec7)
  • maximum heapsize is 4GB
  • threadpool configuration is on default
  • swap is disabled

I already checked the following:

  • uptime: the load is under 0.5 on every server
  • top: iowait is 0%
  • restarting both node changes nothing
  • the cpu load stays low on both servers

The issue seems similar to an issue I had with locally threading many
searchrequests where they were getting timeouts. Is it possible that
some synchronisation code is blocking? It seems to me that the
inconsistent answer times when there is no load on the server might be
some kind of concurrency problem.

Best,
Michel

0 260 66
1000 436 66
2000 745 93
3000 1077 74
4000 1009 84
5000 4319 100
6000 1965 160
7000 1724 130
8000 1538 99
9000 1508 306
10000 1946 107
11000 1818 265
12000 1757 122
13000 1548 127
14000 2374 127
15000 3241 132
16000 3433 139
17000 2255 311
18000 2735 148
19000 4503 171
20000 2439 160
21000 2695 163
22000 2500 155
23000 2752 168
24000 2637 188
25000 4183 179
26000 3690 176
27000 3483 187
28000 3691 187
29000 3917 342
30000 5990 199
31000 3925 223
32000 3897 208
33000 4552 206
34000 4243 230
35000 3525 217
36000 3028 210
37000 3344 242
38000 4006 241
39000 3515 227
40000 4751 287
41000 4164 279
42000 3167 279
43000 5387 285

On Thu, Jun 30, 2011 at 6:23 PM, Clinton Gormley
<clinton@iannounce.co.uk (mailto:clinton@iannounce.co.uk)> wrote:

On Thu, 2011-06-30 at 18:18 +0200, Michel Conrad wrote:

I further investigated the issue and it occurs not at the beginning of
the paging, but only after I set from to 35000 or something. Even if I
dont really understand why, could it be that the merging takes much
longer if there are more results to return?

I presume you're using the default of 5 shards?

If you want to start from document 35,000, then each shard needs to find
the 35,000 best results local to that shard, then 5 x 35,000 records get
returned to the requesting node, which chooses the best 35,000 of those.

You don't want to do this :slight_smile: It's the same reason google doesn't give
you more than 1,000 results for any search.

The only efficient way to get this many results out is to use a scrolled
search with search_type set to 'scan', but the results are not sorted.

clint