Long request time after upgrade to 7.1

Recently upgraded Elasticsearch and Kibana from 6.7.1 to 7.1.1. Elasticsearch cluster is 5 hot nodes using SSD for the most recent 7 days of data and 5 warm nodes with much larger, spinning disks for the next 53 days of data. Kibana node is also running Elasticsearch as a coordinator only node.

Since upgrading to 7.1.1, when using the Discover tab and performing simple searches, e.g. searching for a specific match on a keyword field like tags, the responses sometimes timeout and routinely take 30-40 seconds to come back even when the time picker is set for the 'last 15 minutes'. In the Inspect tab it shows the query time as something reasonable, like 403ms, but the request time is something like 38000ms. I can take the same query and run it manually by itself (e.g. in Dev Tools console) and it returns very quickly. I've also noticed that when searching in the Discover tab that all of my 'warm' nodes that have data older than 7 days spike CPU. When the search is complete, they go back to almost no CPU usage. Run another search, all 5 warm nodes spike up to 100% again.

It seems like something in 7.1.1 is causing searches through the Discover tab to search ALL indices that match the pattern rather than restricting to indices that contain data from the selected time.

Any ideas on where I can look next to determine why Kibana/Elasticsearch are behaving this way and how to correct it?

darkmoonvt over on IRC had this to say:

Kibana is supposed to do a field stats query to determine which indices have fields with values in a certain range. It's supposed to do this with @timestamp when you do a query to limit which indices it actually queries. (this ties in with the lifecycle management and getting away from the dated index name convention).

For some reason, this tends to fail and it queries all indices in the pattern. This is supposed to be fixed in 6.x, but if you're not using the default templates it still has issues.

I'm having trouble finding an issue in github for this to reference. Anyone else run into this and have more detail and/or potential workarounds for this?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.