Elasticsearch GET query optimization for speed

sarya · October 6, 2015, 12:26pm

Hi ,
We have a data of 300 million records , each with 600 fields in our elasticsearch cluster which is 5 nodes with 4 cpu's with replication 1.
When we run a GET query with fuzzy 1 or some time fuzzy 2 on some of its fields . Our performance / Latency is degrading with number of concurrent threads .
If we had a single threaded client sending requests sequentially our Latency is 300 millieconds .
But for 10 threads it is 1200 milliseconds and 20 threads it is 2500 milliseconds .

Since it is a 5 node 4 cpu cluster there are 20 cpu's available for concuurent processing . I would have thought even for 20 cpu's we should get the latency to be 300 milliseconds .
RAM on our systems is 32 GB .
Can any body throw some insights into why the performance could be degrading .
Or give some tips about optimizing the cluster .

Thanks
Sarya

nik9000 · October 6, 2015, 12:47pm

There are lots of things that make performance non-linear with the number of threads. How many shards and how many replicas do you have and how has elasticsearch assigned them? There is always an interplay between disk utilization, disk cache hit rate, and CPU usage that is worth thinking of. Fuzzy is quite CPU intensive.

Rather than talking about theory I'd try to figure out what is going on directly with the hot_threads api and jstack. hot_threads works for things that are truly slow but it is more likely to give false positives because of the way it measures. jstack is harder - you have to run it a few times and manually classify the threads.

softwaredoug · October 6, 2015, 6:40pm

Fuzzy queries are notably expensive. Have you looked into alternates here? For example, many time people use fuzzy queries in place of stemming, when stemming is a better option. Tuning or moving away from fuzzy queries may have a pretty significant impact on your performance

Topic		Replies	Views
Performance problems Elasticsearch	12	586	July 6, 2017
Slow search response time (low CPU utilization) Elasticsearch	7	3398	July 31, 2019
Index Dimensioning and Optimization (across the Cluster) Elasticsearch	6	376	March 24, 2021
Cluster Horizontal Scaling Elasticsearch	5	1262	December 26, 2017
Slow Query performance on small data Elasticsearch	13	2269	July 6, 2017

Elasticsearch GET query optimization for speed

Related topics