What is the recommended shard, replica count for a index with respect to resiliency of the cluster in a 6 data nodes cluster?

I have ELK 5 cluster with 6 data nodes,
Please let me know what is the recommended shard & replica count for a index with respect to resiliency of the cluster?
how can i make sure the number of Primary shards & replicas number so that my cluster should not go into red? Please help

The answer to this question largely depends on factors outside of Elasticsearch, including your business needs.

I'll answer the technical part first: Elasticsearch "goes red" when it doesn't have at least 1 copy (replica) of a shard online, assigned, and available in the cluster. In order to understand how many shards you'd want though, you need to account for your own business logic.

Take, for example, a 2 data node setup where you have 1 primary shard and 1 replica. Elasticsearch will be smart about putting the replica on a different node than the primary to be able to recover in the event of a 1 node failure/restart/etc. While both nodes are in a normal state, the cluster will be green in this situation. If you restart 1 data node now, one of the replicas will be restarted with it and the cluster will go into a "yellow" state because it has only 1 copy of the data live in the cluster. If you restart both of your 2 data nodes at the same time, the cluster won't have any live copies of the data left and will be in a red state. So, for example, if your business forces multiple node restarts at the same time, you may want more than 1 replica to avoid a red cluster.

Next, you might have a situation where you have multiple nodes on the same physical rack and only a single power line connected to it. Even if your nodes are on separate physical boxes that may be restarted in turn, you may want to account for things like "what if we lose power/network to an entire rack where we have multiple Elasticsearch nodes?" For this, Elasticsearch has allocation awareness that you can configure to distribute shards amongst racks.

Next, your business may require things like continuity of business plans like "what if we lose power/network at an entire datacenter?" For that, Elasticsearch now has cross-cluster replication so you can have a copy of your data in more than 1 cluster and have it be automatically replicated.

These are a few examples of business and architecture questions that need to be answered before someone could provide a recommended replica count. As a general rule of thumb, most people start off with 1 replica and use rolling restarts to mitigate common service disruptions like restarting a server. But you'll want to answer questions like "do we know what racks our servers are on and is losing power to a rack a concern" and "do we need to account for the loss of a datacenter" as a few examples to really answer this question yourself.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.