Kibana 6 - Query/Highlight Performance

Upgraded from 5.6.4 to 6.0.0 today to find query taking 5x the amount they have previously.

These queries are where there is no field specified, e.g. term

Digging around GitHub I couldn't find anything too similiar to what I'm experience reported.

I've found remedies in two different ways

  • Disabling highlighting in Kibana
  • Changing the default Kibana query string to use a specific field (e.g. message) as opposed to *

This problem seems to exist on all our indexes (smallest has 400 fields, largest 1500) - They're not particularly big, < 5GB in size.

5.6.4 does not exhibit the same behavior. I also do not use the _all field so I simply can't update the default query to use that.

Related (I believe) Topics:

For the slow query, could you grab the raw query that's being sent from your browser's developer tools? You should see an _msearch request in the network tab.

I suspect this might be due to the fact that the _all field in version 6.0 has been deprecated and replaced with an all_fields option in the query string query. Instead of copying over data into a separate field that is indexed, which is what the _all field did, the new query iterates over all fields, meaning that data does not have to be indexed more than once but requiring more fields to be queried. It could be that you are seeing this as you have a reasonably large number of fields.

I did wonder that. But we've not been using _all fields. The other odd thing I've noticed is that queries are now spanning over way more shards than previously.

E.g. if indexing every 24 hours (default logstash) I can query for the last 1 hour and the _msearch returns saying it has queried all the shards for every logstash-* index we have. Again, this isn't a behaviour I can see on 5.6.4.

I'll link the _msearch response when I'm back in the office on Monday, the upgrade to 6.0 has had a few hiccups so far :wink:

If you use query strings and do not specify a field, _all was used behind the scenes prior to 6.0. Querying more data and shards can naturally also affect performance. Make sure that you do not end top with a lot of small shards, as this can be inefficient.

Oh really? That's interesting, I'll dig through one of the 5.x.x clusters we have running.

Regarding the shard thing - I'm seeing something which seems counter-intuitive to me. If a logstash index called logstash-2017.11.18 exists today and has 2 shards, and I query for the last 30 minutes data - I would expect to see in _msearch a total of 2 shards queried.

What I am seeing instead is _msearch return with a shard count which is the total number of shards for all logstash-* indexes. How come the query is now checking previous indexes shards?

In early 5.x versions, Kibana used the field stats API to identify exactly which indices to query. This replaced expanding date patterns.

In version 5.4, this API was deprecated as checking this at query time was made much more efficient, and these extra calls no longer required. I believe Kibana since then just queries against the index pattern, which might be why you see a larger number of shards respond than before.

1 Like

Thanks for the explanation on that, clarifies quite a few things. I'll be sure on Monday to post the _msearch response/request.

As I said, for now, I've just set a default field for Kibana to query as opposed to it querying all fields.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

I think I have experienced this as well. If I call /_msearch with the same query Kibana uses, it takes 5000ms. If I remove the "highlight":... part of the query, it returns in 100ms or less.

I tested on Elasticsearch 6.2.3, auditbeat 6.2.4 (provides the template).

Original query:

{"index":["infosec-auditbeat*"],"ignore_unavailable":true,"preference":1524525012740}
{"version":true,"size":500,"sort":[{"@timestamp":{"order":"desc","unmapped_type":"boolean"}}],"_source":{"excludes":[]},"aggs":{"2":{"date_histogram":{"field":"@timestamp","interval":"5m","time_zone":"America/Los_Angeles","min_doc_count":1}}},"stored_fields":["*"],"script_fields":{},"docvalue_fields":["@timestamp"],"query":{"bool":{"must":[{"query_string":{"query":"connect","analyze_wildcard":true,"default_field":"*"}},{"match_phrase":{"beat.hostname":{"query":"auditbeat-8mm7k"}}},{"range":{"@timestamp":{"gte":1524524346418,"lte":1524538746418,"format":"epoch_millis"}}}],"filter":[],"should":[],"must_not":[]}},"highlight":{"pre_tags":["@kibana-highlighted-field@"],"post_tags":["@/kibana-highlighted-field@"],"fields":{"*":{}},"fragment_size":2147483647}}

{"responses":[{"took":5267,...}

And removing the highlight part:

{"index":["infosec-auditbeat*"],"ignore_unavailable":true,"preference":1524525012740}
{"version":true,"size":500,"sort":[{"@timestamp":{"order":"desc","unmapped_type":"boolean"}}],"_source":{"excludes":[]},"aggs":{"2":{"date_histogram":{"field":"@timestamp","interval":"5m","time_zone":"America/Los_Angeles","min_doc_count":1}}},"stored_fields":["*"],"script_fields":{},"docvalue_fields":["@timestamp"],"query":{"bool":{"must":[{"query_string":{"query":"connect","analyze_wildcard":true,"default_field":"*"}},{"match_phrase":{"beat.hostname":{"query":"auditbeat-8mm7k"}}},{"range":{"@timestamp":{"gte":1524524346418,"lte":1524538746418,"format":"epoch_millis"}}}],"filter":[],"should":[],"must_not":[]}}}

{"responses":[{"took":25,...}

As for my specific data:

GET /_cat/indices/infosec-auditbeat*

green open infosec-auditbeat-6.2.4-2018.04.24 7rGmI3A7T9anmVCphrflFw 5 1 713229 0 638.5mb   321mb
green open infosec-auditbeat-6.2.4-2018.04.23 YmY2OHc3RIOlkR1Xh1d0eA 5 1 329153 0 372.1mb 186.5mb

My mapping is moderate in size, GET /infosec-auditbeat*/_mapping (two indices) returns a JSON object which, when pretty-printed, is 2660 lines. This is the default auditbeat index template except for the index name changed.

2 Likes