Is it possible to have the cluster use a node's hardware specs for allocation decisions?

I know that it's possible to use custom attributes in allocations for things like rack-awareness, but I'm more interested in whether or not elastic is capable of taking a node's hardware specs into consideration when deciding how many shards to allocate to that node.

Say you have ~20 nodes. The nodes are 4-core, 32GB Mem, 900GB SSD machines. Say overall disk usage is low (15-20%). Say CPU usage is very low. However, heap usage is about 50-60%. The disk and CPU usage suggests that this would be an over-provisioned cluster, but the heap usage would make me a little leary of just taking down nodes one at a time and allowing the cluster to rebalance for fear of crossing the 75% heap suggestions.

Given that, I'd think this cluster could get away with significantly less hardware as long as the total RAM was kept steady. I.e., I think it could get away with ~10 nodes or even less, same 4-cores, same storage, as long as the RAM was bumped up to 64GB per node. Might even consider bumping the cores to 8 cores.

How would you go about implementing this and keeping this running smoothly? Say a single node is currently hosting 10 shards. Ideally one of the new nodes would pick up 20 shards when we bring it up if we also brought down 2 nodes. I'd want to move slowly, so I think we'd remove one node, let things settle, bring up the new node, settle, then bring down a second old node. Rinse, repeat, etc...

Setting aside any voting/master eligibility concerns and concentrating ONLY on node allocation, how could one tell elastic that the new node is capable of handling twice the load of the old ones in order to prevent the master from just allocating shards evenly across ALL nodes as old nodes are taken out and new nodes are brought in?

I'm not sure there is a way to do this with today's allocator unfortunately, Elasticsearch treats nodes as pretty much equivalent (within each tier at least). I think the way I'd do this would be to just add all the new nodes at once (temporarily scaling up the cluster well beyond its needs) and then vacate the old nodes with an allocation filter so they can be safely decommissioned.

I was afraid of that, but thanks for the reply.

I'm curious, could you explain the problem with adding all the new nodes and then vacating all the old ones? My thinking is that either you're in a Cloud-like environment so you would only be paying more while the cluster is oversized (which won't be long), or else you're in an on-prem-like environment where you have to commission all the new nodes ahead of time anyway. How would a more gradual approach be better?

Only for peace of mind. We're not given to making changes like this "fast", so we're going to want to spin up a new node (yes, we're cloud-based) and then decommission two old ones and make sure everything is working smoothly, and then continue one at a time with other nodes. We're just unlikely to decide to make a large switch like this without testing things out, and this isn't high enough priority to warrant duplicating the production setup to test it out or something like that.

So it's not a problem, per se, just something I was hoping we could do using allocation rules. All good.

Gotcha, I see. If you did that I'm not sure the observations from the intermediate states would give you very meaningful information, and it would be a bit of a pain to revert back to the original setup if you encountered problems after several steps. Personally I would be more comfortable running with the double-sized cluster for a while and using allocation filters to move things across fairly gradually, watching & testing carefully as you go. Admittedly a little more expensive, but much safer & easier to revert IMO.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.