Most of the time is spend on the ES cluster. I'm not sure if I'm using keep
alive connections. I think so as curl uses them by default (
).
I don't think sorting is the problem. Strangely enough.
A last thing I noticed is that the performance got better over time. Even
with clearing out the caches. I'm guessing the JVM has something to do with
that?
As was suggested, segmenting the indexes by time can help, especially
with the ability to grow the data more easily. It does require extra work
on your end, for example, when searching, start with latest, and if you
don't get enough results, start to move to previous date indices. The nice
thing is that this can work ok usability wise, you can already display the
(possibly partial results) to the user from the first index while you go
and fetch additional ones.
But, lets stick with the current config and see what we can do, I would
love to know some more data. Note, the best thing to do is to have
measurements that do not include cache warmup time, as it brings on too
much noise to the system. It would be great if you can run it also on 0.19.
- Are the 20 concurrent requests running from the same box? Maybe that
box is saturated?
- Can you also measure the total_time returned from the search response?
This is the time spent in ES cluster. I would love to know the time without
the client communication overhead (for example, are you using persistent
connections / keep alive or not?).
- Can you execute a search without the sorting? How long does that take?
(with the above both curl time and total_time)? Sorting is mainly CPU
intensive. Lets verify the overhead of sorting in your case.
- 20 concurrent on a 10 shard index will cause 200 concurrent requests
executed on the cluster of 3 nodes. If might be taxing the CPU (we can
verify on #3). You can try and bound the number of concurrent search
requests executed on a node by configuring the thread pool for search. See
more here:
Elasticsearch Platform — Find real-time answers at scale | Elastic.
On Monday, March 5, 2012 at 7:23 PM, haarts wrote:
Again with the excellent questions!
We are now running 850 requests, 20 concurrent (with Ruby Typhoeus). This
was used to warm up the cache. Before running the tests we cleared the
cache.
We measure search latency by CURLINFO_TOTAL_TIME (
libcurl - curl_easy_getinfo()).
After warming up the cache we ran 10.000 searches. During these searches
we monitored Bigdesk and 'top'. Whenever we were not maxing out the disks
we were maxing out the CPU's. Memory seems absolutely fine. The average
response time with hot cache was around 600ms. 300ms would make our boss
happy (and not only him).
On Monday, 5 March 2012 17:06:39 UTC+1, kimchy wrote:
Regarding warming up, just issue something like 100-1000 search requests
with the sorting, and only then start to make your measurements.
What is the load that you are running on the system? I mean, how do you
measure the search latency? How many concurrent searches do you execute?
When you run the load test, do you see CPU maxing out, or other resources?
On Monday, March 5, 2012 at 6:02 PM, haarts wrote:
Hi,
On Monday, 5 March 2012 16:36:06 UTC+1, kimchy wrote:
First, have you allocated enough memory to ES? When sorting, it needs to
load the created_at values to memory. I would suggest allocating something
like 12gb to it. Also, why do you need 10 shards with 3 replicas on 3
machines. First, you end up with 4 copies of each shard, where you really
want, I guess, 3 to have a copy on each machine? In this case, have
index.number_of_replicas set to 2.
I was in error. We actually have 2 replicas. We also allocated 13GB of JVM
heap size.
Also, in this case, why do you need 10 shards? Especially if you want a
copy on each node. Less shards will mean less memory being used (now you
have 10 allocated on each node).
We choose for 10 shards as we expect to scale up to 1 billion items. We
guessed we needed more than 3 machines for that.
Also, after you are done with bulk indexing the 150 million data, I
suggest you optimize the index, it will improve the memory requirements it
needs. Optimize it down to ~5 segments (max_num_segments in the optimize
API). Note, this will be a heavy operations.
Excellent, we gave that a try. Performance now is around 1000ms +/- 500ms.
Last, the initial search request will take some time since they need to
load the values of created_at to memory. Make sure when testing to warm
things up (both loading up the values, but also "warming" the JVM").
What do you mean exactly by warming up the JVM? Just do 10 random
searches? Or is there some more rigorous way of doing this?
On Monday, March 5, 2012 at 4:59 PM, haarts wrote:
Hi!
We currently have 150 million document in an index, 10 shards, 3 replicas
and 3 machines (8 core, 24GB RAM). Search works but rather slow. Query time
measures in seconds, often more than 5. I've been playing with several
queries:
curl -XGET http://188.165.222.156:9200/**tweets/_searchhttp://188.165.222.156:9200/tweets/_search-d '{"query":{"filtered":{"
**query": {"match_all":{}}, "filter":{"term":{"text":"house"}}}},
"sort":{"created_at":{"order":"desc"}}}'
and
curl -XGET http://188.165.222.156:9200/**tweets/_searchhttp://188.165.222.156:9200/tweets/_search-d '{"query":{"text":{"text":"
house"}}, "sort":{"created_at":{"order":"desc"}}}'
I have had no luck making these queries performant. Also tried
search_type=query_then_fetch with mixed results.
Any tips?
With kind regards,