Same shards on different physicals servers

Hi I have deployed EFK stack on Kubernetes cluster, I have 3 nodes that have both roles data and master, the 3 Elasticsearch nodes are on 3 different Kubernetes nodes, but the Kubernetes nodes are on 2 different physical servers, so that means that 2 of the 3 Elasticsearch nodes will always be on the same physical server, so I want to make sure that replica shards are on node that is on different physical server then the primary shard. Is there a way I can do this?

I saw there is way to set cluster.routing.allocation.same_shard.host: true, but from the response from GET _nodes/stats, I see that the 3 Elasticsearch nodes have different IP address for host, that is because the 3 Kubernetes nodes that the Elasticsearch nodes are on have different IP addresses. So I don't think this will work because the value for host will always be different.

This should work if you add the following settings to elasticsearch.yml on all nodes:

node.attr.machine: {{ physical_host }}
cluster.routing.allocation.awareness.attributes: machine

For {{ physical_host }} you put in some identifier for your two servers (like k8s-host-01, k8s-host-02)

This way elasticsearch won't put replicas on the same physical host as their primary shards.

Check out the docs on Shard Allocation Awareness.

If I add labels to my Kubernetes nodes, on 2 of them I add label server=host1 and on 1 of them I add label server=host2 and I add this in the elasticsearch.yaml:

node.attr.server: host1, host2
cluster.routing.allocation.awareness.attributes: server 

Will this work, would Elastic nodes be able to see the labels on the Kubernetes nodes they are on?
Or is there any way Elastic nodes can see labels on Kubernetes nodes they are on?

That's not quite what I had in mind. You only need to edit the Elasticsearch config of the three instances - nothing else.

Assuming you have 2 Kubernetes nodes on host1 and 1 Kubernetes node on host2. Then configure the two Elasticsearch instances on host1 as follows:

node.attr.machine: host1
cluster.routing.allocation.awareness.attributes: machine

And configure the third Elasticsearch instance on host2 as follows:

node.attr.machine: host2
cluster.routing.allocation.awareness.attributes: machine

That should be all there is to it.

host1 and host2 can be any names, when I will have

node.attr.machine: host1

on one Elasticsearch instance and

node.attr.machine: host2

on the other two Elasticsearch instances.

Elasticsearch will know which instances are on the same physical server and it will distribute the primary shard and its replica shard to be on Elasticsearch instances with different "node.attr.machine:" value?

Yes, exactly. You can read more about how this works in the docs like I provided.

One more question, when I do a rollout restart on Elasticsearch statefulset, or 2 of the 3 nodes crush they can be scheduled on different Kubernetes node then they are now, in that case there is a chance that the "node.attr.machine:" value won't be correct, the value may be same for 2 Elasticsearch nodes that are on different physical servers and in that case the primary and replica shard would be on Elasticsearch nodes that are on same physical server.

Is there a way to configure the "node.attr.machine:" value dynamically depending on what Kubernetes node they are on?

If you can ensure that an environment variable exists on the Kubernetes nodes that identifies the physical server (e.g. HOST), you can use this in your Elasticsearch config:

node.attr.machine: ${HOST}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.