Entire cluster goes down from single query

Not sure how this is possible, but the same query was working fine. We upgraded all our nodes to 6.1.2 and it still seemed to be working but then after a little bit the cluster stopped responding. I couldn't get it to load in the browser either, so I sshed into the server and tried a CURL request there and it worked. I restarted the Elasticsearch services and all the nodes recovered and the health all became green.

Okay, so I try loading our site again, which ran this query, and it instantly goes offline again. Not a single response from the cluster.

I restart them all and they come back. I open Postman and try a GET request. That works. Makes sense. So I try a POST request with a simple match_all query. Comes back fine. No issues. I try a complicated bigger POST query. Takes a few seconds, but that's pretty fast for what I requested.

Then I try the query our site was running. Postman freezes.

I restart Postman, try it again. Freezes before I can even see the response.

Never seen this before and not sure how / why this is happening. After dissecting the POST request body, it looks like this is the section that causes it.

			"aggs": {
				"last_hour": {
					"date_histogram": {
						"field": "date",
						"interval": "minute",
						"format": "date_hour_minute"
					}
				}
			},

How wide of a time frame are you querying across?

:sweat_smile: 6 months?

Now that I think about it, I think I had a filter in the query section before, and I removed it thinking I didn't need it...

That's 262,800 buckets. That's a lot.

Elasticsearch is going to be trying to calculate these buckets for you and I'd imagine the load on the server would be super high, hence why it looks like it goes down.

Makes sense to me. Thanks!

:joy: :sweat_smile: :sweat: :cry:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.