How to protect an ES cluster from searches that would kill it?

Unfortunately, this really happened to me in prod yesterday. It was ugly.

I have a three tiered cluster with masters and clients and data nodes all separated.

My 4 clients have these settings (among others) for a 30GB heap:

 "indices.fielddata.cache.size": "60%",
 "indices.breaker.total.limit": "75%",
 "indices.breaker.request.limit": "50%",
 "indices.breaker.fielddata.limit": "65%",
 "threadpool.bulk.queue_size": "500",
 "threadpool.bulk.size": "32",
 "threadpool.index.queue_size": "500",
 "threadpool.index.size": "32",
 "threadpool.search.queue_size": "2000",

My 6 data nodes have these settings:

 "indices.fielddata.cache.size": "30%",
 "indices.breaker.total.limit": "70%",
 "indices.breaker.request.limit": "30%",
 "indices.breaker.fielddata.limit": "35%",
 "indices.memory.index_buffer_size": "60%",

I saw TONS (1440 in about 1.5 hrs) of these:

org.elasticsearch.ElasticsearchException: org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [event_detail] would be larger than limit of [11274289152/10.5gb]

And these (31K in about 1.5 hours):

Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 2000) on org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler@3c34ae72

which had ugly downstream effects that I won't go in to. Obviously some instruction on proper search techniques is in order, but what else from a cluster perspective can be done to help keep searches from killing the cluster.

M thoughts:

  • I think my search queue of 2000 is WAY too high. Maybe 20 instead.
  • I think my indices.breaker.fielddata.limit is WAY too high. What field needs to return 10GB of data? That should be much lower. Maybe 5%? Which is still 1.5GB for a single field.
  • Same thing for indices.breaker.request.limit. 50% is 15GB, 30% is 9GB. That sounds outrageously high. Again, maybe 5%?

Thanks for the insight.
Chris

Yes to all your points. Increasing these is usually only a bandaid fix and you end up pushing the problem to somewhere else.

Also look into doc values.

Thanks Mark. I do have doc_values set up for all my mappings also :smile:

So, here's another question. All those settings are updateable via the cluster API, which means they are cluster-wide settings. Right now I have two sets of configs, one for clients and one for data nodes. Does a node keep its "local" settings from its elasticsearch.yml file if it differs from what the other node types have? I'm getting hung up on the scope of what is called "cluster-wide", but can also be specified at a node-local yml file.

I'd like to have different settings on my clients than on my data nodes, if that is possible.

Chris

I think it goes node > cluster, but you should be able to tell using the _nodes API.

Welp, here's what I did:

Update the cluster API to the whole cluster as such:

PUT /_cluster/settings?master_timeout=3000000
{
    "persistent" : {
        "threadpool.search.queue_size" : 20,
        "indices.breaker.request.limit": "30%",
        "indices.breaker.fielddata.limit": "35%"
    }
}

Saw it take effect in the logs:

[2015-07-14 01:45:10,844][INFO ][indices.breaker          ] [elasticsearch-bdprodes10] Updating settings parent: [PARENT,type=PARENT,limit=22548578304/21gb,overhead=1.0], fielddata: [FIELDDATA,type=MEMORY,limit=11274289152/10.5gb,overhead=1.03], request: [REQUEST,type=MEMORY,limit=9663676416/9gb,overhead=1.0]

Then updated the elasticsearch.yml file on just the client nodes as such:

indices:
  breaker:
    fielddata:
      limit: 5%
    request:
      limit: 5%
    total:
      limit: 75%
  fielddata:
    cache:
      size: 60%

Then cycled only the client nodes, presumably to take these new settings, but on startup, they took the same as the data nodes/cluster settings:

[2015-07-13 21:47:29,875][INFO ][indices.breaker          ] [elasticsearch-bdprodes01] Updating settings parent: [PARENT,type=PARENT,limit=24159191040/22.5gb,overhead=1.0], fielddata: [FIELDDATA,type=MEMORY,limit=11274289152/10.5gb,overhead=1.03], request: [REQUEST,type=MEMORY,limit=9663676416/9gb,overhead=1.0]

The _nodes API confirmed the same. I thought this would do it, but it doesn't look like it. Perhaps there is a different way to accomplish this?

Chris

Still trying to get this to work right. No luck yet getting nodes to have independent configs from the cluster defined ones.

Is it possibly documented somewhere? I'm not finding it, but I could be missing it.

Many thanks!
Chris