Hi,
Thanks for the clarifications. I think I found the primary issue of
the problem. The behaviour of elastsearch seems
correct to me. Now the searchtime is linear to the start of the
returned slice. The merging accross 20 shards only
takes 2 times as search in a single shard, which seems acceptable to
me. The problem seems that there is an issue
with different nodes in my cluster. I now switched to only connect to
a single node using the TransportClient. With some
nodes I am getting correct results as mentioned above. The problem is
when I connect to some other nodes the response
time gets bigger and fluctuates, therefore the results I posted in the
beginning of this thread were incorrect.
I pasted some results further down, in the first column is the from
setting, in the second the response time I get from server A, in the
third column the respone time I get from server B. Server B behaves
corrently, and returns in short timeframes consistently.
Server A has issues generating the response, on average 16x longer.
I query both servers in exactly the same way and they are getting data
from 19 shards.
My configuration is the following:
- I am using 0.16.3-SNAPSHOT (revision 08648ec7)
- maximum heapsize is 4GB
- threadpool configuration is on default
- swap is disabled
I already checked the following:
- uptime: the load is under 0.5 on every server
- top: iowait is 0%
- restarting both node changes nothing
- the cpu load stays low on both servers
The issue seems similar to an issue I had with locally threading many
searchrequests where they were getting timeouts. Is it possible that
some synchronisation code is blocking? It seems to me that the
inconsistent answer times when there is no load on the server might be
some kind of concurrency problem.
Best,
Michel
0 260 66
1000 436 66
2000 745 93
3000 1077 74
4000 1009 84
5000 4319 100
6000 1965 160
7000 1724 130
8000 1538 99
9000 1508 306
10000 1946 107
11000 1818 265
12000 1757 122
13000 1548 127
14000 2374 127
15000 3241 132
16000 3433 139
17000 2255 311
18000 2735 148
19000 4503 171
20000 2439 160
21000 2695 163
22000 2500 155
23000 2752 168
24000 2637 188
25000 4183 179
26000 3690 176
27000 3483 187
28000 3691 187
29000 3917 342
30000 5990 199
31000 3925 223
32000 3897 208
33000 4552 206
34000 4243 230
35000 3525 217
36000 3028 210
37000 3344 242
38000 4006 241
39000 3515 227
40000 4751 287
41000 4164 279
42000 3167 279
43000 5387 285
On Thu, Jun 30, 2011 at 6:23 PM, Clinton Gormley
clinton@iannounce.co.uk wrote:
On Thu, 2011-06-30 at 18:18 +0200, Michel Conrad wrote:
I further investigated the issue and it occurs not at the beginning of
the paging, but only after I set from to 35000 or something. Even if I
dont really understand why, could it be that the merging takes much
longer if there are more results to return?
I presume you're using the default of 5 shards?
If you want to start from document 35,000, then each shard needs to find
the 35,000 best results local to that shard, then 5 x 35,000 records get
returned to the requesting node, which chooses the best 35,000 of those.
You don't want to do this It's the same reason google doesn't give
you more than 1,000 results for any search.
The only efficient way to get this many results out is to use a scrolled
search with search_type set to 'scan', but the results are not sorted.
clint