Slow search response time (low CPU utilization)

jwistrom · July 3, 2019, 7:38am

Hi
We are currently experiencing performance issues with our ES setup. We can see that our system load is close to max while the CPU utilization is only about 50%.
Setup:

Es version: 6.5.3
Nodes: 6
20 shards/index
1 replica/index
2 physical servers
~35GB used storage
16GB heap (50% of node memory)
From monitoring tab of Kibana (for ES cluster):
- Disk available: 90%
- JVM heap: 23%
- Index search rate: 40.000/s
We have altered the default settings accordingly:
- thread_pool.search.min_queue_size: 10
- thread_pool.search.queue_size: 4000
- thread_pool.search.max_queue_size: 8000
- thread_pool.search.size: 35

We do a multi search (150 batch) bool query with 3 "should"s (1 on text type and 2 on keyword type) and 1 "must not" (on keyword type). Any idea on how to "optimize" the setup to up the CPU utilization and get more performance out of ES? We want to optimize for multiple concurrent request with low response time from ES. Our client currently records response times of ~600 ms from ES.

Regards
Johan

DavidTurner · July 3, 2019, 7:45am

35GB is a very small amount of data, and 20 shards per index is a very large number of shards, appropriate for indices that are each around 1TB in size. How many indices do you have? How many shards are there in total? Assuming this wasn't a typo, it's likely that you will get better performance by vastly reducing the number of shards in your cluster.

Here is some more guidance on oversharding:

jwistrom · July 3, 2019, 8:07am

Thank you for your fast reply. We have 5 indices in total but we only query one in the previously mentioned query. So that index has 20 shards and 6GB of data. We will definitely try to reduce number of shards. With consideration to the 6GB of data in the index, do you think we will suffice with 1 shard? Or should we have the same amount of shards as nodes? @DavidTurner

Christian_Dahlqvist · July 3, 2019, 9:13am

Set 1 primary shard and adjust the number of replicas so all nodes hold a copy.

jwistrom · July 3, 2019, 10:22am

Thank you very much @Christian_Dahlqvist. We are seeing a huge performance increase when we changed to 1 shard and 5 replicas (we have 6 nodes). Just for my curiosity, why is it better with 1 shard and 5 replicas instead of 3 shards and 1 replica (both add up to 6)? Have I understood it correctly that it is better with replicas instead of shards because then ES doesn't have to merge results from multiple nodes for one single query?

sharju · July 3, 2019, 5:22pm

My short (about a year in production) experience with ES is, that you should not split an index to multiple primaries unless you have concerns about disk space or your index speed is capped by lousy HW (disk I/O, network ifs etc). Replicas boost your throughput via parallel query processing and provide HA by distribution. If a single index is getting ridiculously big, you should look towards hourly/daily/weekly/monthly/whatever indices or just start rolling with index lifecycle management on size/doc count.

For benchmarking purposes, I had a 1TB index in 3 primaries with 1 replicas, and performance was absolute garbage. Tried same data in 20 indexes with only 1 primary + 1 replica, and queries went instantly from minutes to milliseconds.

Mapping plays a big part also. Having explicit fields like
"measurement1": 123, .... "measurement100": 234
is something I often see on the forums, and to me that is an absolute waste of performance compared to
"measurement": 1, "value": 123

It's not about the amount of data, it's all about the design.

Christian_Dahlqvist · July 3, 2019, 6:02pm

The ideal number of primary shards and replica depends a lot on the use case: data volume, type of data, type of queries, query latency requirements and query concurrency.

In this case your index is quite small, so having all the data in a single primary shard is unlikely to lead to slow queries (each query is run singlethreaded against a shard, although multiple shards are processed concurrently). This means that a single thread can serve the query vs 3 in the previous configuration. There is as you mention also less work merging results from different shards. As you have a reasonably high query concurrency you benefit from using more replicas as you can serve more concurrent queries across the cluster.

system · July 31, 2019, 6:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Low performance while searching Elasticsearch	5	517	November 27, 2020
I think my heap is sized right but searches are still super slow Elasticsearch	9	1561	May 29, 2017
Concurrent Search in elasticsearch Elasticsearch	7	2156	July 5, 2017
ES 6.4.1: Almost empty cluster doesn't perform as expected Elasticsearch	8	551	November 20, 2018
Query Performance Elasticsearch	11	1845	July 6, 2017

Slow search response time (low CPU utilization)

Related topics