Hi. We have a 3 node Elasticsearch 1.7.5 cluster that has been staying at 85-89% JVM heap usage. Each node is has 16GB of RAM allocated to Elasticsearch. Due to the high JVM heap usage, we’re planning on expanding our cluster, but we’re not sure whether we should add additional data nodes or instead add dedicated client node(s). We do perform some fairly heavy-weight aggregations. Is there a way to determine whether we’ll get the best performance improvement from just increasing the cluster versus adding client nodes?
All client nodes do is handle the reduction phase of search or the aggregation calculations.
Chances are adding clients nodes to such a small cluster won't be worth the same value as adding more data nodes.
Thanks for the help Mark. After looking more in depth at our heap usage in Marvel, it looks like the JVM heap usage spikes we were seeing all relate to our field data size increasing significantly due to large queries. We're going to add additional data nodes.
Look into doc values too.