ES 6.8 Force primary shards of the same index to be allocated to different physical hosts

Hi Team,

Is there a way to force primary shards in the same index to be allocated to different physical hosts? (I mean a physical server and not an elasticsearch node.)

I know we can do this for replicas of the same shard using the below cluster setting but I can't seem to find one that would apply for primaries of the same index:

Reason why I'm trying to do this is: I have a (non-ideal) cluster setup (beyond my control) where I have multiple data nodes residing on same physical server. I know that in an ideal situation we should have 1 ES node per host.

I have daily indexes where today's shards are written to a lot and because I have multiple data nodes on a single server, sometimes "too many hot" shards from today get allocated to nodes that reside on same physical server (hot meaning being written to/read from a lot).

Now that creates an imbalance in terms of performance in the cluster where a server will become a bottleneck because its disk IO + cpu resources are being maxed out whereas other physical servers are just sitting idle.

So I was thinking if there is a setting to force primaries of an index to go to different physical servers or at least to have a preference to go to different physical servers then that would make the cluster more stable.

Many thanks for your help.

1 Like

Are you only indexing new documents or are you also updating existing documents a lot?

1 Like

I'm only indexing new documents.

Looking at the github issue, it does talk about improving the balancing of the primaries but only through the nodes, i.e. on the node level and not the host level.

My problem is not that the primaries (of the same index) are on the same node but rather these primaries end up on different nodes but these nodes happen to be on the same physical server causing this server to choke while there are other servers available sitting idle.

Are you sure it's just the primaries causing your issue? I would expect primaries and replicas to share the load equally in this case. The issue linked above isn't relevant if you are only indexing new documents.

Can you share more details of your cluster? How many data nodes are there, and how are they distributed amongst the physical hosts?

Yeah it's the primaries because these indexes of mine do not have replica shards.

My cluster has roughly 70 data nodes but I only have 10 physical (large) servers. So it's about 7 data nodes per server.

Ok, the way the original question was phrased makes it sound like your replicas are behaving properly and it's just the primaries that are special. If you had replicas, I think they would be suffering the same issue.

How many daily indices do you have and how many shards are in each one?

Are your nodes all configured the same or are you using a hot/warm architecture?

Yep, I also think if there were replicas then they might have same problem.

All the nodes are configured the same.

In total, roughly there are 1500 indexes (15 days history so roughly 100 indexes per day) . On average 10 shards per index (so 100 indexes per day times 10 shards = 1000 shards per day on the 70 data nodes)

Ok, let's say that host 0 holds nodes 0A through 0G; similarly host 1 holds nodes 1A through 1G, and so on up to host 9. If you restricted each index to a single letter with a shard allocation filter then I think its 10 shards would be evenly spread across your hosts.

You'd need to use your judgement to pick a letter for each index in a way that spreads the load out enough.

Many thanks for the tip.

I think it can be made to behave the way I would like using the allocation filter.

But the thing is it could introduce quite a bit of configuration though. And thinking about it I am unsure whether I could practically make it work dynamically, cos I would have to somehow configure the daily indexes to go their corresponding letter, i.e.:

Day 1 index: go to letter A nodes
Day 2 index: go to letter B nodes
So on...

Now either the senders (logstash, etc) somehow would need to be able to decide the mapping for the right letter based on date or something else I'm not sure yet. Or if not the senders then I don't know what other way we can configure this dynamically so that we can make Index of Day Z to go to Nodes of Letter X.

I hope what I said makes some sense.

Also I have daily indexes for different types of data, which adds a layer of configuration complexity if I want to use allocation filter because say today I receive data for Zoo and School, I would want Zoo data to go to Nodes of letter A and School data to go to Nodes of letter B.

I think it would get a bit out of hand.

I do not think you should rotate your indices through the node letters as you suggest. Since you have quite a lot of indices per day, I expect you should be able to divide them into 7 piles that are roughly equal in terms of load, and always put the 'A'-type of indices on the 'A' nodes etc.

(I should also point out that running 70 nodes on 10 hosts is not really the recommended architecture, so you might have to put in quite some effort to get it to work)

I couldn't agree more really. It's far from ideal.

Do you think though it would be worth creating a github issue to propose a configuration for cluster.routing.allocation to force primaries of the same index to go to different hosts? As to achieve the same result using index level allocation filters is looking like a lot of work that is not guaranteed to be efficient.

A Github issue would be a good way to gauge the level of interest in improving this setup for everyone; if there are enough other users in a similar situation then it's more likely that Elasticsearch will get the features it needs to address it.

I would recommend not using the word "primary" so prominently in your issue. The problem you describe affects primaries and replicas equally, and you are more likely to find other users with the same issue if you use more general terminology.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.