Single index with 60 shards, 1 primary and 2 replicas, shards ranging from 10G to 40G
260Mil documents
not using doc values (looking into it)
The problem presented itself after attempting to replace the r3.large masters with m3.large ( http://www.ec2instances.info/?filter=m3.large ) with 4G of heap configured. The 3 masters exhibited the following behavior:
i.e. the same behavior. During the time we had the smaller masters in charge of the cluster we saw it directly affecting the rest of the cluster. We replaced the masters with m3.xlarge instances ( http://www.ec2instances.info/?filter=m3.xlarge ) because we weren't unsure at the time if the problem was CPU or Heap. Now it looks to me as if Heap pressure was the problem, you will notice the datanodes in the last image heap usage is back to around 75% and old GC back to almost nothing.
A few questions then:
Why the heap pressure in the masters affected the entire cluster? (in such a way that every datanode also showed heap pressure)
In this particular use case, what would be the ideal master setup? My hunch says memory is the factor here (as far as I can tell the active master's process is single-threaded so it doesn't really matter how many cores the machine has?) and having 7G of heap size configured seems to have "fixed" this issue
Is it particularly harmful for the active master to be serving requests? i.e. should I remove the active master from the configured URLs in the services using ES?
Other general suggestions with regards to index/shard size appreciated
We have 9 pretty big mappings for that particular index. I would paste but:
curl -XGET http://localhost:9200/my_index/_mapping?pretty > /tmp/mappings.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 85.5M 100 85.5M 0 0 15.1M 0 0:00:05 0:00:05 --:--:-- 20.3M
So not easy to share. I can tell you though that the mappings have in average 75 properties each.
That's quite large, especially if there's 9 of those! ES needs to store that in cluster state, which is held in memory. It does compress is but that's still pretty big.
Looking into the upgrade path to possibly 1.7.5, which will help I am sure.
With regards to my original question, are you suggesting the heap needed by the master is directly tied to the number and size of the mappings?
The graphing is kind of messed up because the sampling size is 1mo. You can tell the heap usage on the active master (green) is steady at 75% and the other 2 much lower.
If you are sending all requests through these nodes, they are not truly dedicated master nodes, but rather master/client nodes. It is recommended that dedicated master nodes do not serve traffic, but are allowed to just manage the cluster. This will prevent them from suffering from memory pressure intense garbage collection and make the cluster more stable.
If you instead set up dedicated client nodes, you may be able to reduce the size of the dedicated master nodes as they will be doing a lot less work.
Interesting. My initial thought was that resource usage in the masters was so low they could be used for routing, specially the passive masters since they are doing close to nothing. If your suggestion is that I can have routing (client) nodes then what would be the minimum requirements to run these?
Truly dedicated master nodes can often be considerable smaller than the ones you used and have heap as a larger portion of the available memory (~75%) than data nodes as they do not need as much file system cache as nodes that hold data. Instances suitable here may be t2.medium or m3.medium unless the cluster state is very large.
The same recommendation around heap size also applies to client nodes, as they also do not store data. In addition to acting as routers, these nodes also coordinate requests and perform the final aggregation steps once data from the underlying shards have been gathered, which is why they can suffer from garbage collection depending on the work load. These are more difficult to size, but a good starting point may be the instances types you previously used together with the higher heap setting.
Ah, here we go we're converging back to size of cluster state, think that's my key problem. I believe I have 2 final questions:
Is there a formula to size the JVM according to the cluster state? i.e. can I grab the mappings and some how get a number of Gb I absolutely need for my heap size?
Is the active master process single-threaded? So far I think so but I haven't dug through the code to make sure.
In my case I have 3 masters, but only one is active at any given time. This active master shows cpu usage on a single core all the time, that to me means that the ES master process is always bound to a single CPU, but I am not 100% positive.
sorry, I didn't mean to say 1 core is always pegged, I meant to say when there is activity in the active master it's always a) brief and b) only on a single core. Kinda hard to grab a hot_threads dump when it spikes so briefly.
Most of the work a dedicated master node does is probably linked to maintaining the cluster state, which could very well be single threaded in nature. It is generally not very CPU intensive, which is why I recommended using instances with relatively little CPU.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.