Elasticsearch Cleint Nodes OOM Killed by Gargantuan Query

IanGabes · May 30, 2019, 5:16pm

Hi all!

I think we may have discovered a bit of an edge case or bug with elasticsearch and I'm just looking to confirm this known issue, or potentially document a new bug. One of my analysts has constructed a query consisting of ~3 million terms in a terms query within the must clause of a bool query. After submitting this gargantuan query, my client nodes almost immediately get OOM killed by their host system.

I am fully aware that this is a... interesting method of attempting to retrieve data. I have worked with this person to get a working query, but the interesting part to me is the fact that the only sign of a problem (aside from all my ES client node service being dead) is the OOM killer message in the syslog. I would hope that a message like "jesus dude what is this query, it killed me" would appear in my node logs, or some other sort of representative message. In fact, I have no logs at all related to this. I was only able to piece the "root cause" together with the OOM killer timestamps, and requests to an alternate data store.

I am running Elasticsearch 6.5.1, and the user was directly connecting to ES. Maybe this has been fixed in a newer version?

Thanks for your time!

DavidTurner · May 30, 2019, 8:19pm

There is no way the node could log anything if shut down by the OOM killer. By the time you get into this state it's too late, the OS takes over and stops the process without any opportunity to clean it up.

However, if the OOM killer got to you before the JVM reported an OutOfMemoryError then I suspect your heap size is set too high. It absolutely must be set to less than 50% of your total memory since the JVM can use around double the configured maximum heap size, but even 50% of your total memory allows no space for the OS and other processes on the same system.

Also, there have been changes in this area in Elasticsearch 7 to make it much better at pushing back on unreasonable searches.

IanGabes · May 30, 2019, 8:55pm

Thanks for the information here. Upgrading to ES7 is definitely on our horizon, i will dig into the change logs. We had 8GB of 16GB configured for our heap, so maybe our problem revolves around that configuration.

Thanks for taking the time to reply.

system · June 27, 2019, 8:56pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch Died on me Elasticsearch	22	1763	July 1, 2018
Client Nodes being oom-killed Elasticsearch	17	2347	June 28, 2018
Client nodes killed by kernel OOM killer Elasticsearch	16	3113	June 5, 2019
Elasticsearch process is killed by OOM killer Elasticsearch	4	5588	March 3, 2020
Elasticsearch 5.2.2 : Memory keeps on increasing steadily untill ES gets killed by System OOM Killer Elasticsearch	4	1220	June 12, 2017

Elasticsearch Cleint Nodes OOM Killed by Gargantuan Query

Related topics