Dev Tools - A GET request caused Kibana to completely stop responding

Hello,

Last month I tried to run the attached query using our company's instance of Kibana, using the Dev Tools feature (The field names and values have been replaced by placeholders in the attached query but it is structurally identical to the query I ran).

The query is intended to return the following documents:

  • The document must have a timestamp between 2017/09/19 15:10:00.726 and 2017/09/19 15:11:00.726Z
  • The document must contain FieldOne (Integer field) with a value of 1200
  • The document must contain FieldTwo (String field) with an exact value of "AB1-CDE-FGHI-02"
  • The document must contain FieldThree (String field) and this field must contain the string ""
  • In addition, FieldThree (String field) must not contain the string "0"
  • I used a search size of 10,000 since this is the maximum allowed, but expected much fewer documents to be returned by the query (under 1000)
  • For each matching document the query should return FieldThree and FieldFour (String field) - ("_source": ["FieldThree", "FieldFour"])

After I ran the query Kibana did not respond for about a minute, then displayed a message in the right Dev Tools window indicating that the request had failed to reach the server.

We were then unable to use Kibana company-wide for about 10 - 15 minutes. Going to the Discover tab of Kibana resulted in a red error bar appearing at the top of the screen, and similarly our Dashboards which contain many Timelion graphs had a red error bar across the top of the screen and stopped updating. After this 10 - 15 minute period of time the issue resolved itself and we were able to continue using Kibana as normal. I tried running the query on two separate days as the first time we thought it may have coincided with a separate error. However, both times the query caused this behaviour in Kibana. Unfortunately I did not note down the exact error messages Kibana reported so cannot provide these.

We are using Kibana Version 5.1.1. I also have a local install of Kibana 5.1.1 with much less data in it, I ran a similarly structured query (including the double * wildcard must_not filter) on this local install before running it against the company server, but on my local install the correct data was returned and there were no issues.

I have the following questions:

  • What is it in this query that caused Elasticsearch/Kibana to become unresponsive? I understand that double wildcard queries are very expensive, but as it is the final filter in the query it should only be applied to 1000 or so documents, which I thought we would be able to process.

  • Relating to the above question, do Elasticsearch filters logically short circuit in the order they are written? (e.g. since the filter "FieldThree.keyword": "0" is the final filter in my bool, is it only applied to documents which have not been filtered out by the preceeding filters or are all filters applied to all documents?) I think the answer to this is probably yes but if it is not then this could explain why the query was so expensive.

  • Prior to running this query, I thought that the worst impact a GET request could have would be that it would take a long time to return (or time out entirely) and while it was running Elasticsearch/Kibana would be slower for other users. I did not think it could make Kibana completely stop responding. Is there a way to add a safeguard to a GET request so that it times out if it takes too long to respond rather than using Kibana's resources until it stops responding (e.g. similar to how if a query in Timelion graph takes more than 30 seconds to complete it times out and the graph is not displayed)?

  • As mentioned above, I have a local install of Kibana that I tested the query on beforehand, but it did not catch the error in this case. Are there any suggested methods on how to better test queries before running them against the company server?

Thanks in advance, any help is appreciated.

The query was:

GET logstash-*/_search?size=10000
{
	"_source": ["FieldThree", "FieldFour"],
	"query": {
		"bool": {
			"filter": {
				"range": {
					"UTC": {
						"gte": "20170919T151000.726Z",
						"lt":  "20170919T151100.726Z",
						"format": "basic_date_time"
					}
				}
			},
			"filter": {
				"term": {
					"FieldOne": 1200
				}
			},
			"filter": {
				"term": {
					"FieldTwo.keyword": "AB1-CDE-FGHI-02"
				}
			},
			"filter": {
				"match": {
					"FieldThree": "<ABCDEFGHIJK>"
				}
			},
			"must_not": {
				"wildcard": {
					"FieldThree.keyword": "*<ABCDEFGHIJK>0*"
				}
			}           
		}
	}
}

The elasticsearch forum might be able to provide more insights, as I believe the reason Kibana was unresponsive is due to the in-flight query.

My assumption is, for whatever reason, the query which was ran caused instability in the cluster. There are short-circuits which are available to help prevent some of these instances. However, I wouldn't expect a single query to cause this issue unless the cluster was already under provisioned.

Something else which would be helpful is the Profile API. This would give you a better understanding of the cost of the query.

Hi Tyler, thanks for your help and the links. I'll have a look into the profile API and see if I can understand why the query had such a large impact.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.