Elasticssearch data allocation

I have a Elasticsearch cluster with three nodes with the following configuration.
Minimum master nodes -2
primary shards 3
replica shards 2

Number of documents on espav01 and espav03 are the same but on espav01 i see about 3000 documents less than the other two nodes. Health status of the cluster is green .

I think the number of documents should be the same on all 3 nodes right?

shards status shows:
csv2 0 r STARTED x.x.x.1 espav01
csv2 0 p STARTED 445263 6gb x.x.x.2 espav02
csv2 0 r STARTED 458212 6.3gb x.x.x.3 espav03
csv2 1 r STARTED 454375 6.4gb x.x.x.1 espav01
csv2 1 r STARTED 441704 6.1gb x.x.x.2 espav02
csv2 1 p STARTED 454375 6.3gb x.x.x.3 espav03
csv2 2 p STARTED 458356 6.3gb x.x.x.1 csespav01
csv2 2 r STARTED 445453 6gb x.x.x.2 csespav02
csv2 2 r STARTED 458356 6.4gb x.x.x.3 csespav03


What version of Elasticsearch are you running? Can you reliably reproduce this issue (from an empty cluster)? Is your cluster generally stable? What does your indexing workflow look like? Do you change the refresh_interval and/or number_of_replica settings as any part of your indexing workflow?

I am running elasticsearch 1.1.1 and in the process of upgrading 2.0.
Cluster has been stable and the number of documents were same on all nodes until recently.
We index about 5000 documents every week and I don't change refresh_interval and/or number_of_replica settings as any part of the indexing workflow

Elasticsearch 1.1.1 is incredibly old and there have been an incredible number of consistency bugs resolved since then. You're doing the right thing to upgrade. If you do continue to see this issue after you upgrade, please report back!

Is there any thing that can be done to fix this problem in the mean time?

How about restarting the node?

The easiest option would be if you have all of your data available to just reindex from scratch (on a brand-new Elasticsearch 2.1.1 cluster!).

If this is not an option, since your cluster is an inconsistent state, it's difficult to say exactly what state your data is in and whether or not there is a single node that holds all of the data. It looks like there is a chance that only node espav02 is out of sync; the other two nodes might have all the data because they mutually hold copies of each shard that match the document count on the other, and that number of documents is the maximum for that shard copy.

It would be best to conduct the following operations during a long maintenance window. It would not be a bad idea to first test this process on another cluster. You could even test it on your laptop with three nodes running on your laptop (you don't need a full set of the documents, and you don't need to start from an inconsistent state to verify that the following process ends in a consistent state). I can't overstate the importance of doing this during a long maintenance window, and testing the operation first.

Turn off off any sources of indexing activity. Then, you should verify that these copies are in fact "good" copies.

For the next step, note that you will be offline for reads.

Then, shutdown Elasticsearch on all three nodes and make a backup of the cluster.

After you have verified that that they are "good" copies and have taken a backup, start Elasticsearch on nodes espav01 and espav03, and wait for the cluster to get to a yellow state (it should promote a replica copy on either espav01 or espav03 for shard 0 to be a primary).

At this step, you should be back online for reads but keep any sources of indexing activity turned off.

Now, set the number of replicas to one. This will get the cluster to a green state. Then, move (but do not delete) the data directory on espav02. Now startup espav02. After it has started, set the number of replicas back to two. This will start a recovery from the copies that you verified were good to espav03.

Do note that during the recovery process you will see a lot of network activity and your cluster might not be responsive.

After this, the document counts should match.

After you have verified the cluster is in a consistent state and that all of the data that you expect to be there is in fact there, you can restart your indexing sources.

Thanks for your reply.

Version 1.1 does support snapshot and I do have a snap shot of the cluster from few weeks back.
I was wondering if I can create a new index and load the data from the snapshot and modify my alias to point to my new index and reindex the remaining data.

Yes, if you already have an existing snapshot that you know is "good", then by all means this is a viable option!

Please do let us know if you're able to reproduce the issue after upgrading to the latest version of Elasticsearch.