Kibana 6 - Query/Highlight Performance

fenneh · November 16, 2017, 11:25pm

Upgraded from 5.6.4 to 6.0.0 today to find query taking 5x the amount they have previously.

These queries are where there is no field specified, e.g. term

Digging around GitHub I couldn't find anything too similiar to what I'm experience reported.

I've found remedies in two different ways

Disabling highlighting in Kibana
Changing the default Kibana query string to use a specific field (e.g. message) as opposed to *

This problem seems to exist on all our indexes (smallest has 400 fields, largest 1500) - They're not particularly big, < 5GB in size.

5.6.4 does not exhibit the same behavior. I also do not use the _all field so I simply can't update the default query to use that.

Related (I believe) Topics:

github.com/elastic/kibana

[Discover] Discover times out when there's a lot of fields (b/c of highlight)

opened 01:07AM - 02 Jun 17 UTC

closed 10:57PM - 29 Aug 17 UTC

cwurm

bug Feature:Discover

**Kibana version**: 6.0.0-alpha1 **Elasticsearch version**: 6.0.0-alpha1 … **Server OS version**: Ubuntu 16.04.2 LTS **Browser version**: Safari 10.1.1 (12603.2.4) **Browser OS version**: macOS 10.2.5 **Original install method (e.g. download page, yum, from source, etc.)**: download **Description of the problem including expected versus actual behavior**: Discover view times out **Steps to reproduce**: Open Discover view for a relatively small index (6.7K docs, 15.6MB data), but with a lot of fields (950 in my case) - the Discover view times out. The data is from a pcap file, extracted using `tshark -T ek`. **Query executed (retrieved from slowlog)** ``` GET packets-2017-06-01/_search { "size":500, "query":{ "bool":{ "must":[ { "query_string": { "query":"*", "fields":[], "use_dis_max":true, "tie_breaker":0.0, "default_operator":"or", "auto_generate_phrase_queries":false, "max_determinized_states":10000, "enable_position_increments":true, "fuzziness":"AUTO", "fuzzy_prefix_length":0, "fuzzy_max_expansions":50, "phrase_slop":0, "analyze_wildcard":true, "escape":false, "split_on_whitespace":true, "boost":1.0 } }, { "range": { "timestamp": { "from":null, "to":null, "include_lower":true, "include_upper":true, "boost":1.0 } } } ], "adjust_pure_negative":true, "boost":1.0 } }, "version":true, "_source":{ "includes":[], "excludes":[] }, "stored_fields":"*", "docvalue_fields":["timestamp"], "script_fields":{}, "sort":[ {"timestamp": {"order":"desc", "unmapped_type":"boolean" }} ], "aggregations":{ "2":{ "date_histogram":{ "field":"timestamp", "time_zone":"America/Los_Angeles", "interval":"135ms", "offset":0, "order":{ "_key":"asc" }, "keyed":false, "min_doc_count":1 } } }, "highlight":{ "pre_tags":["@kibana-highlighted-field@"], "post_tags":["@/kibana-highlighted-field@"], "fragment_size":2147483647, "fields":{ "*":{ "highlight_query":{ "bool":{ "must":[ { "query_string":{ "query":"*", "fields":[], "use_dis_max":true, "tie_breaker":0.0, "default_operator": "or", "auto_generate_phrase_queries":false, "max_determinized_states":10000, "enable_position_increments":true, "fuzziness":"AUTO", "fuzzy_prefix_length":0, "fuzzy_max_expansions":50, "phrase_slop":0, "analyze_wildcard":true, "escape":false, "split_on_whitespace":true, "all_fields":true, "boost":1.0 } }, { "range":{ "timestamp":{ "from":1331901000000, "to":1331901006792, "include_lower":true, "include_upper":true, "format":"epoch_millis", "boost":1.0 } } } ], "adjust_pure_negative":true, "boost":1.0 } } } } } } ``` **Response when executed in Console (truncated)** ``` { "took": 100490, "timed_out": false, "_shards": { "total": 2, "successful": 2, "failed": 0 }, "hits": { "total": 6666, "max_score": null, "hits": [ ``` **Response from same query, without highlight** ``` { "took": 114, "timed_out": false, "_shards": { "total": 2, "successful": 2, "failed": 0 }, "hits": { "total": 6666, "max_score": null, "hits": [ ``` 100 seconds vs. 114 milliseconds is quite the difference on a dataset that comfortably fits into memory.

github.com/elastic/kibana

Auto expanding fields in Discovery/Query-String performance issue

opened 02:48PM - 31 May 17 UTC

closed 09:29PM - 24 Jul 17 UTC

gmoskovicz

blocker PR sent Feature:Query Bar

**Kibana version**: 5.4.x **Elasticsearch version**: 5.4.x **Server OS ver…sion**: ANY **Browser version**: ANY **Browser OS version**: ANY **Original install method (e.g. download page, yum, from source, etc.)**: ZIP file **Description of the problem including expected versus actual behavior**: Since https://github.com/elastic/elasticsearch/pull/20925 when executing a query string query in an index (index pattern) that had `_all` disabled, we will look at all the fields in the mapping that are not metafields and can be searched, and automatically expand the list of fields that are going to be queried. Basically, we will expand the query string for each specific field that you have in the index. Saved searches and discovery usually use the following query, along with other filters: ``` "query": { "query_string": { "analyze_wildcard": true, "query": "*" } }, ``` This means that for index patterns that have lots of fields, we will see the query expanded with many `ConstantScore(_field_names:<FIELD_NAME_HERE>` and this can be a performance impact compared with previous Elasticsearch/Kibana versions. This has been introduced in `ES/Kibana 5.1.1`. This can be easily tested by grabbing the network call from a discovery page. This two steps are a quick repro: ``` PUT test { "mappings": { "test_type": { "_all": { "enabled": false }, "properties": { "prop1": { "type": "text" }, "prop2": { "type": "text" }, "prop3": { "type": "text" }, "prop4": { "type": "text" } } } } } POST test/_search { "profile": true, "version": true, "size": 500, "sort": [ { "_score": { "order": "desc" } } ], "query": { "query_string": { "analyze_wildcard": true, "query": "*" } }, "_source": { "excludes": [] }, "stored_fields": [ "*" ], "script_fields": {}, "docvalue_fields": [], "highlight": { "pre_tags": [ "@kibana-highlighted-field@" ], "post_tags": [ "@/kibana-highlighted-field@" ], "fields": { "*": { "highlight_query": { "query_string": { "analyze_wildcard": true, "query": "*", "all_fields": true } } } }, "fragment_size": 2147483647 } } ``` Note that i added the `profile: true` to be able to see the lucene expressions being used. My proposal is to, when `_all` is disabled and no `fields` or `default_field` is used, add a warning that the query will auto expand to all the fields available in the index patter, and that will have an extra cost.

Bargs · November 17, 2017, 6:31pm

For the slow query, could you grab the raw query that's being sent from your browser's developer tools? You should see an _msearch request in the network tab.

Christian_Dahlqvist · November 18, 2017, 8:12am

I suspect this might be due to the fact that the _all field in version 6.0 has been deprecated and replaced with an all_fields option in the query string query. Instead of copying over data into a separate field that is indexed, which is what the _all field did, the new query iterates over all fields, meaning that data does not have to be indexed more than once but requiring more fields to be queried. It could be that you are seeing this as you have a reasonably large number of fields.

fenneh · November 18, 2017, 8:46am

I did wonder that. But we've not been using _all fields. The other odd thing I've noticed is that queries are now spanning over way more shards than previously.

E.g. if indexing every 24 hours (default logstash) I can query for the last 1 hour and the _msearch returns saying it has queried all the shards for every logstash-* index we have. Again, this isn't a behaviour I can see on 5.6.4.

I'll link the _msearch response when I'm back in the office on Monday, the upgrade to 6.0 has had a few hiccups so far

Christian_Dahlqvist · November 18, 2017, 8:54am

If you use query strings and do not specify a field, _all was used behind the scenes prior to 6.0. Querying more data and shards can naturally also affect performance. Make sure that you do not end top with a lot of small shards, as this can be inefficient.

fenneh · November 18, 2017, 1:36pm

Oh really? That's interesting, I'll dig through one of the 5.x.x clusters we have running.

Regarding the shard thing - I'm seeing something which seems counter-intuitive to me. If a logstash index called logstash-2017.11.18 exists today and has 2 shards, and I query for the last 30 minutes data - I would expect to see in _msearch a total of 2 shards queried.

What I am seeing instead is _msearch return with a shard count which is the total number of shards for all logstash-* indexes. How come the query is now checking previous indexes shards?

Christian_Dahlqvist · November 18, 2017, 1:51pm

In early 5.x versions, Kibana used the field stats API to identify exactly which indices to query. This replaced expanding date patterns.

In version 5.4, this API was deprecated as checking this at query time was made much more efficient, and these extra calls no longer required. I believe Kibana since then just queries against the index pattern, which might be why you see a larger number of shards respond than before.

fenneh · November 18, 2017, 2:30pm

Thanks for the explanation on that, clarifies quite a few things. I'll be sure on Monday to post the _msearch response/request.

As I said, for now, I've just set a default field for Kibana to query as opposed to it querying all fields.

system · December 16, 2017, 2:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

jordansissel · April 24, 2018, 3:10am

I think I have experienced this as well. If I call /_msearch with the same query Kibana uses, it takes 5000ms. If I remove the "highlight":... part of the query, it returns in 100ms or less.

I tested on Elasticsearch 6.2.3, auditbeat 6.2.4 (provides the template).

Original query:

{"index":["infosec-auditbeat*"],"ignore_unavailable":true,"preference":1524525012740}
{"version":true,"size":500,"sort":[{"@timestamp":{"order":"desc","unmapped_type":"boolean"}}],"_source":{"excludes":[]},"aggs":{"2":{"date_histogram":{"field":"@timestamp","interval":"5m","time_zone":"America/Los_Angeles","min_doc_count":1}}},"stored_fields":["*"],"script_fields":{},"docvalue_fields":["@timestamp"],"query":{"bool":{"must":[{"query_string":{"query":"connect","analyze_wildcard":true,"default_field":"*"}},{"match_phrase":{"beat.hostname":{"query":"auditbeat-8mm7k"}}},{"range":{"@timestamp":{"gte":1524524346418,"lte":1524538746418,"format":"epoch_millis"}}}],"filter":[],"should":[],"must_not":[]}},"highlight":{"pre_tags":["@kibana-highlighted-field@"],"post_tags":["@/kibana-highlighted-field@"],"fields":{"*":{}},"fragment_size":2147483647}}

{"responses":[{"took":5267,...}

And removing the highlight part:

{"index":["infosec-auditbeat*"],"ignore_unavailable":true,"preference":1524525012740}
{"version":true,"size":500,"sort":[{"@timestamp":{"order":"desc","unmapped_type":"boolean"}}],"_source":{"excludes":[]},"aggs":{"2":{"date_histogram":{"field":"@timestamp","interval":"5m","time_zone":"America/Los_Angeles","min_doc_count":1}}},"stored_fields":["*"],"script_fields":{},"docvalue_fields":["@timestamp"],"query":{"bool":{"must":[{"query_string":{"query":"connect","analyze_wildcard":true,"default_field":"*"}},{"match_phrase":{"beat.hostname":{"query":"auditbeat-8mm7k"}}},{"range":{"@timestamp":{"gte":1524524346418,"lte":1524538746418,"format":"epoch_millis"}}}],"filter":[],"should":[],"must_not":[]}}}

{"responses":[{"took":25,...}

As for my specific data:

GET /_cat/indices/infosec-auditbeat*

green open infosec-auditbeat-6.2.4-2018.04.24 7rGmI3A7T9anmVCphrflFw 5 1 713229 0 638.5mb   321mb
green open infosec-auditbeat-6.2.4-2018.04.23 YmY2OHc3RIOlkR1Xh1d0eA 5 1 329153 0 372.1mb 186.5mb

My mapping is moderate in size, GET /infosec-auditbeat*/_mapping (two indices) returns a JSON object which, when pretty-printed, is 2660 lines. This is the default auditbeat index template except for the index name changed.

Topic		Replies	Views
Issue with queries using highlighting and fields * - hundreds of times slower Kibana	4	1373	September 16, 2018
Highlight query slow performance in ES 5.4.1 on specific index Elasticsearch	1	800	August 29, 2017
Kibana time out during discover request on index with large number of fields while direct elastic query is fast Kibana	6	2231	September 20, 2017
Highlighting slows down Kibana searches considerably Kibana	4	237	March 12, 2024
Discover query : timeout after a 99h treshold Elasticsearch	4	493	April 4, 2018

Kibana 6 - Query/Highlight Performance

Related topics