Shards are not equal size in one index


#1

We are using Elasticsearch 2.2 0-1in a POC with a fairly large dataset, 3 indexes across 3 nodes, one replica, utilizing the default 5 shards per index and pretty much default settings. One of our indexes is about 175GB but among the 5 shards the data is distributed like so :
Shard # Docs Size
0 127,870 1.5gb
1 46,150 239.4mb
2 409,846 1.7gb
3 13,055,899 169gb
4 130,106 667.6mb

What the heck is going on here? Shouldn't elasticsearch try to somewhat balance all the data across the shards. Isn't that why you create multiple shards? This 169G shard is a real problem and my other (much smaller) indexes have their data distributed perfectly. I don't see many people complaining about this and I have seen the tempest cluster balancing option but I am hesitant to try and tweak anything since this seems like a basic requirement. Could you please help me understand or tell me where I went wrong. All the documentation I read about shard rebalancing and allocation seems to be specific to moving shards around on nodes not the data contained in a shard.


(Christian Dahlqvist) #2

That is indeed a very uneven distribution. Are you by any chance using parent-child relationships or routing for this index?


(Mark Walkom) #3

Or are you providing your own ID for the documents.


#4

I just spoke with the developer on the project and he said there was some custom routing added. He is going to try to remove it and see if it resolves the issue.


(Mark Walkom) #5

It will :slight_smile:


(system) #6