[SOLVED] Whole cluster down after sorting on _id

voharunado · February 16, 2021, 2:07pm

Hi,

I just had a massive issue with my ES prod cluster. I use an ES plugin for IntelliJ IDE that allows to see search result as a table, which can be ordered by clicking on the columns: it then add a sort parameter corresponding to the column.
Issue is, I missed clicked on the _id column (right next to the column I wanted to sort on), and sorting on it was not deactivated in the plugin.
I do know that sorting on _id is not recommanded, but I expected ES to just return me an error.
Instead of that, the whole cluster went down (the search was with a wildcard that matches indexes that are on my 6 nodes), progressively, node by node (only one survived).

I tried the same on my dev ES (just one node, in docker): it crashed too, but after restarting and doing the same test, I had a CircuitBreakingException, which is fine: java.util.concurrent.ExecutionException: CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [5313833239/4.9gb], which is larger than the limit of [5085934387/4.7gb]]

Issue is, why my ES prod cluster went down for that? Shouldn't that have been caught and the same kind of exception returned? Is there a possible config issue with that?

I'm using ES 6.7. Here is the last log I had before one node crashed: [2021-02-16T14:18:07,063][INFO ][o.e.m.j.JvmGcMonitorService] [lwg-es-1] [gc][89 - Pastebin.com
Thanks for any help.

DavidTurner · February 16, 2021, 6:34pm

If you upgrade to 7.6+ you can disable loading fielddata on the _id field:

With that change, it would indeed return you an error instead of taking the whole cluster down.

voharunado · February 17, 2021, 10:07am

Hi,

Thanks for your answer, I'll do that.
But I thought that such heavy requests would get killed if the cluster or a node was about to go down...
Preventing the issue on the _id field is a fine workaround for my specific case, but knowing that the whole cluster can go down with a request as "basic" as this is kind of scary.
Isn't there anything we can do in the config to prevent such massive failure to happen?

Thanks.

DavidTurner · February 17, 2021, 10:22am

Watertight protection against this sort of thing is basically impossible, but note that 6.7 was released almost 2 years ago and is already well past EOL. In the meantime we've added many more layers of protection against such harmful requests, such as the option I linked, so you will have a much better experience if you upgrade.

voharunado · February 17, 2021, 10:28am

Ok, I will do that soon then, thanks for your help!

system · March 17, 2021, 10:29am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data too large, data for [_id] would be [12177806640/11.3gb] Elasticsearch	1	466	September 1, 2019
Aggregation / Sort and CircuitBreakingException Elasticsearch	6	496	July 6, 2017
Disabling fielddata on the cluster/index level Elasticsearch	2	1076	November 25, 2019
I want to know why the indices.id_field_data.enabled configuration is turned off by default Elasticsearch	2	1489	July 26, 2023
IDs query tripping circuit breakers Elasticsearch	7	1532	June 30, 2017

[SOLVED] Whole cluster down after sorting on _id

Related topics