- Ubuntu 14.04
- ES 1.4.2 from elastic repos
- Java 1.7.0_95
- 6 c3.8xlarge data nodes ( http://www.ec2instances.info/?filter=c3.8xlarge ) with 30G of heap configured
- 3 dedicated masters (originally http://www.ec2instances.info/?filter=r3.large ) with 7G of heap configured, all client requests are routed through these 3 servers
- Single index with 60 shards, 1 primary and 2 replicas, shards ranging from 10G to 40G
- 260Mil documents
- not using doc values (looking into it)
The problem presented itself after attempting to replace the r3.large masters with m3.large ( http://www.ec2instances.info/?filter=m3.large ) with 4G of heap configured. The 3 masters exhibited the following behavior:
In essence: over 75% heap usage and increased amount of GC and CPU (due to GC probably)
At the same time, every datanode exhibited the following behavior:
i.e. the same behavior. During the time we had the smaller masters in charge of the cluster we saw it directly affecting the rest of the cluster. We replaced the masters with m3.xlarge instances ( http://www.ec2instances.info/?filter=m3.xlarge ) because we weren't unsure at the time if the problem was CPU or Heap. Now it looks to me as if Heap pressure was the problem, you will notice the datanodes in the last image heap usage is back to around 75% and old GC back to almost nothing.
A few questions then:
- Why the heap pressure in the masters affected the entire cluster? (in such a way that every datanode also showed heap pressure)
- In this particular use case, what would be the ideal master setup? My hunch says memory is the factor here (as far as I can tell the active master's process is single-threaded so it doesn't really matter how many cores the machine has?) and having 7G of heap size configured seems to have "fixed" this issue
- Is it particularly harmful for the active master to be serving requests? i.e. should I remove the active master from the configured URLs in the services using ES?
- Other general suggestions with regards to index/shard size appreciated
Thank you all.