I just had a massive issue with my ES prod cluster. I use an ES plugin for IntelliJ IDE that allows to see search result as a table, which can be ordered by clicking on the columns: it then add a sort parameter corresponding to the column.
Issue is, I missed clicked on the _id column (right next to the column I wanted to sort on), and sorting on it was not deactivated in the plugin.
I do know that sorting on _id is not recommanded, but I expected ES to just return me an error.
Instead of that, the whole cluster went down (the search was with a wildcard that matches indexes that are on my 6 nodes), progressively, node by node (only one survived).
I tried the same on my dev ES (just one node, in docker): it crashed too, but after restarting and doing the same test, I had a CircuitBreakingException, which is fine:
java.util.concurrent.ExecutionException: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [5313833239/4.9gb], which is larger than the limit of [5085934387/4.7gb]]
Issue is, why my ES prod cluster went down for that? Shouldn't that have been caught and the same kind of exception returned? Is there a possible config issue with that?
I'm using ES 6.7. Here is the last log I had before one node crashed: [2021-02-16T14:18:07,063][INFO ][o.e.m.j.JvmGcMonitorService] [lwg-es-1] [gc][89 - Pastebin.com
Thanks for any help.