We are using Elasticsearch 2.2 0-1in a POC with a fairly large dataset, 3 indexes across 3 nodes, one replica, utilizing the default 5 shards per index and pretty much default settings. One of our indexes is about 175GB but among the 5 shards the data is distributed like so :
Shard # Docs Size
0 127,870 1.5gb
1 46,150 239.4mb
2 409,846 1.7gb
3 13,055,899 169gb
4 130,106 667.6mb
What the heck is going on here? Shouldn't elasticsearch try to somewhat balance all the data across the shards. Isn't that why you create multiple shards? This 169G shard is a real problem and my other (much smaller) indexes have their data distributed perfectly. I don't see many people complaining about this and I have seen the tempest cluster balancing option but I am hesitant to try and tweak anything since this seems like a basic requirement. Could you please help me understand or tell me where I went wrong. All the documentation I read about shard rebalancing and allocation seems to be specific to moving shards around on nodes not the data contained in a shard.