Elasticsearch - High CPU usage for simple query while load testing

jobini · March 19, 2019, 1:00pm

I have an Elasticsearch index having about 93 million documents (68.8GB) on 3 shards. While doing load testing (300 concurrent requests) with one certain query, the CPU usage shoots up to 100%. The query is fairly simple:

{"query": 
        {"bool": 
            {"must": [
                {"match": {"release_title": "Spider - Singles Collection 1976-86"}}, 
                {"match": {"track_title": "TALKIN BOUT ROCK N ROLL (New Version)"}}
                     ]
            }
        }
    }

However, if I make the same type of query with different strings, the CPU usage is normal.

Here is the link to the output of Hot Threads (I'm a beginner in ES, not sure how to interpret this): https://pastebin.com/XTs6vvUF

The system has 14 cores and 40GB of RAM. I've set the JVM heap size to 16GB. Could someone help me figure out what's going wrong?

spinscale · March 20, 2019, 8:19am

Is this query hitting a lot of documents? Is this query different to other queries that you are using? How long is this query executing? Also, you can use the Profile API to see where the time is spent.

Note that by default a match uses an OR to combine terms, so you may just hit a lot of documents that require to be scored.

--Alex

jobini · March 20, 2019, 11:17am

Yes, it's hitting a lot of documents (about 1.9 million).

The structure of the query is identical to the other queries I'm using, it's only the query string values that change. But I've noticed that the good queries (the ones which don't cause the CPU usage spike) hit a fewer number of documents (<0.1 million).

The query actually takes as low as only 40ms to execute, but while load testing (300 concurrent requests with the same query), it's taking an average of 2s and goes up to even 10s.

Ah, I didn't realize that was the default behavior of match. I've modified the query with AND and fuzziness (which is the intended behavior) and will keep you updated with the new results. Thank you!

jobini · March 20, 2019, 12:58pm

@spinscale: So this is my modified query now:

{

    "query":{
        "bool":{
            "should":[
                {
                    "match":{
                        "release_title":{
                            "query":"Spider - Singles Collection 1976-86",
                            "operator":"and",
                            "fuzziness":"AUTO"
                        }
                    }
                },
                {
                    "match":{
                        "track_title":{
                            "query":"TALKIN BOUT ROCK N ROLL (New Version)",
                            "operator":"and",
                            "fuzziness":"AUTO"
                        }
                    }
                }
            ]
        }
    }

}

This is giving me much better results (average of ~900ms), but I wish to bring it further down to ~300ms or less. If I remove the fuzziness parameter, I'm getting an average of ~40ms. However, I do need fuzziness to account for misspellings and other slight variations in the strings. Is there anything else I can do speed up the query?

spinscale · March 20, 2019, 1:10pm

Hey,

fuzziness is only one solution to this problem - a solution that happens at query time (thus the slowdown). You may want to take a look at the phonetic analysis plugin.

Another idea might be to run a non fuzzy query by default and only run the fuzzy one, if the first one does not return any hits (even though this case might be even slower for the zero hits case).

Also, maybe you do not need fuzziness as you are not interested in typos in single terms, but it might be enough it only a certain percentage of those terms are found. This way you could check the minimum_should_match parameter of the match query.

As you can see this is a super broad question and usually evolves a lot around the data and the concrete use-case, so it's super hard to come up with the one answer, but maybe you have a couple of more options to explore.

--Alex

jobini · March 22, 2019, 6:32am

It seems the minimum_should_match parameter was just what I needed! The stats are looking good now. Thank you for taking the time to help out, Alex!

system · April 19, 2019, 6:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Low search query throughput and high CPU usage Elasticsearch	2	331	July 6, 2017
ElasticSearch High CPU usage 160 queries per second doesn't make sense Elasticsearch	1	987	July 6, 2017
High CPU consumption Elasticsearch	8	8632	July 5, 2017
High CPU Percentage Usage Elasticsearch	3	551	July 5, 2017
High CPU Usage Problem Elasticsearch	1	259	July 6, 2017

Elasticsearch - High CPU usage for simple query while load testing

Related topics