Hi all!
I think we may have discovered a bit of an edge case or bug with elasticsearch and I'm just looking to confirm this known issue, or potentially document a new bug. One of my analysts has constructed a query consisting of ~3 million terms in a terms query within the must clause of a bool query. After submitting this gargantuan query, my client nodes almost immediately get OOM killed by their host system.
I am fully aware that this is a... interesting method of attempting to retrieve data. I have worked with this person to get a working query, but the interesting part to me is the fact that the only sign of a problem (aside from all my ES client node service being dead) is the OOM killer message in the syslog. I would hope that a message like "jesus dude what is this query, it killed me" would appear in my node logs, or some other sort of representative message. In fact, I have no logs at all related to this. I was only able to piece the "root cause" together with the OOM killer timestamps, and requests to an alternate data store.
I am running Elasticsearch 6.5.1, and the user was directly connecting to ES. Maybe this has been fixed in a newer version?
Thanks for your time!