Simple queries takes lots of time and uses 100% cpu

@DavidTurner: that's exactly what I'm saying - queries executed in devtools are done in seconds, while in discovery tab it takes ~10 times more.

Test done right now:

1. Query in discovery tab:
Request code: https://raw.githubusercontent.com/aimarpl/kb_queries/master/query_discovery
While executing, all CPUs at 100%:

Result after ~15 seconds (first query failed after two minutes, so I had to re-run it and then got response in 15 secs):
query_discovery_result

2. Query in devtools tab:
Request code: https://raw.githubusercontent.com/aimarpl/kb_queries/master/query_devtools

While executing & result (result provided almost instantly with almost 0 CPU load):

It's crazy...

I have even disabled highlighting in kibana settings to check on performance then, but... it has no impact at all (it was suggested in one of answers above).

I see the discover vs console examples are querying different date ranges and timestamps - to further diagnose we need to get these requests as close to each other as possible.
The other thing to ensure is that caching is not taking effect. In console you can do this:

  GET myindex/_search?request_cache=false

Another difference between Discover and Console is that Discover will route user requests to the same choice of replica using the preference parameter (you can see this value using the Browser's network tools).
In your Console test, set ?preference=xxxx with the same session id used by the Discover tab.

Good catch, indeed timestamps were a little bit different...

I just did another test applying your suggestions into it:

  1. Query from Discovery
    Request code: https://raw.githubusercontent.com/aimarpl/kb_queries/master/discovery_query2

Execution:

Result:

It took 16 seconds

  1. Query from dev tools:
    Request code: https://raw.githubusercontent.com/aimarpl/kb_queries/master/devtools_query2

Result:

It took around 1-2 seconds


about replicas used - It's a single node with 5 shards and just 1 replica so I believe it does not matter, does it?

Indexes are created daily (winlogbeat-YYYY-MM-dd) wi with 1 shard, 0 replicas.

It's a single node with 5 shards and just 1 replica

Are we talking about a different setup now?

No, yes... a little :wink: Few posts above I tried to reconfigure it from 1 shard and 0 replicas to 5 shards and 1 repl - I thought that maybe then it will run some search jobs in pararell increasing the overall performance, but... it had no impact.

If you only have a single node you'll never get any replicas (there's no added resilience or throughput benefits in allocating replica shards to the same node as the primary shard).

So I guess we can rule out any preference difference between Discover and Console.

It might be worth noting Kibana's description of timings. I wonder if there's added overhead in the unpacking of all that JSON in the 500 doc results (did we already establish performance was linear with the size setting for hits?)

ah, then, basing on last query output it looks like
ES query takes only 54ms - super fast
and then preparing all json docs takes 16460ms ...correct?

I would test that with different sizes as you proposed, but doing that in devtools/console would not give me results I'm looking for (preparing docs).
If I do that from the discovery tab in kibana I can change size in advanced settings, but.... results are cached and each subsequent query for same data is faster. How do I disable cache over there?

If the suspicion is that it's all the JSON marshalling that takes the time the elasticsearch caches may be somewhat irrelevant. That said, there's an API you can use to clear the caches

ok, good hint

first query, 500 results

  • query time: 170ms
  • request time: 27339ms

second query after clearing the cache with following:
POST /winlogbeat-2019.11.14/_cache/clear
POST /winlogbeat-2019.11.13/_cache/clear

Result:
# POST /winlogbeat-2019.11.14/_cache/clear
{
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}

# POST /winlogbeat-2019.11.13/_cache/clear
{
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  }
}

Query time: 129ms
Request time: 634ms

so... even after clearing cache, second query is very fast (for the same data); how is that possible?!

There's OS-level filesystem caches to consider too but that does look like a suspiciously large difference.

After analyzing the website type and traffic, we tweak the key settings in MySQL servers that helps to ensure optimal resource usage. This includes connection limits, buffer size and query cache size, and more.

@Mark_Harwood, @DavidTurner - first of all, thank for you all your hints here :slight_smile: finally some good news here :slight_smile:
Got it fixed - I have disabled filterEditor:suggestValues in Kibana settings so KQL won't suggest values in discovery search and problem is gone.

filterEditor:suggestValues enabled:
|Query time|271ms|
|Request time|53354ms|

filterEditor:suggestValues disabled:
|Query time|380ms|
|Request time|1718ms|

Finally it all works as it should, question is - shouldn't this feature get reported as a buggy? As I read in other threads it looks up all indicies instead of those specified in time range picker which causes such CPU load.

Anyway - I hope that this workaround will help others having same issue with ELK 7.

Glad to hear you identified the bottleneck. I added a note to the relevant Kibana issue.

1 Like