OOM for ES: fielddata.cache.size and breaker.fielddata.limit doesn't work

Hello !
We have a cluster with 3 data, 3 master and 2 client nodes. We're using it as a reporting cluster - running a big queries with aggregations.
ES uses 50% of VM RAM.
We're getting OOM time-to-time (seems related to time of reporting queries). To prevent this we set:
indices.fielddata.cache.size: 40%
indices.breaker.fielddata.limit: 45%
for all nodes in cluster. But seems it doesn't work, we still getting OOM and don't see in logs anything related to CircuitBreakingException. Please advice how to prevent OOM for cluster (prevent execution of killer-queries by cluster?) ? (add more memory - we know about this option :slight_smile: )

Which version of Elasticsearch are you using?

  "version" : {
    "number" : "2.4.4",
    "build_hash" : "fcbb46dfd45562a9cf00c604b30849a6dec6b017",
    "build_timestamp" : "2017-01-03T11:33:16Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.2"
  }

That is a very old version, and I do not remember what the limitations were back then. I recall there have been a number of improvements to circuit breakers across more recent versions (e.g. this one) so I would recommend upgrading.

Thank you for your reply, Christian.
We have plans for upgrading our ES production and reporting clusters, but this will not happen soon ...

Then I suspect adding more memory and/or nodes will be the best way to go. Note that coordinating only nodes can have heap set higher than 50% of total RAM as they do not rely own the file system cache the same way data nodes do.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.