Nodes go out of memory in domino effect

Hi fellas,

I got an elasticsearch cluster (6.4.2) with hundreds of monthly indices each of which has a varying size between 500 gigabytes and 1 terabyte consisting of 5 primary and 5 replica shards (default).

The question is;

When someone query some data ranging more than 2 years (24+ monthly indices) by the web application connected to this cluster, what i observe in the slowlogs in elasticsearch side is; same query is sent to each index that is between the specified time range, and the cluster starts to respond slower, garbage collection logs are printed and eventually nodes get OOM one by one.

Is there a solution to overcome this problem? Maybe a configuration about concurrency or memory management?

Thanks in advance.

May be your shards are too big. 200gb per shard seems excessive to me.

What are the node specifications?

What is the output of:

GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v

If some outputs are too big, please share them on gist.github.com and link them here.

May I suggest you look at the following resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

And https://www.elastic.co/webinars/using-rally-to-get-your-elasticsearch-cluster-size-right

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.