Fewer shards = less disk space used?



Hopefully a quick and easy question;
Does having fewer shards per index equate to less disk space being used?

When I set off on my elastic adventure a year ago I used the default 5 shards per index. I now have 4 different index 'types'(?), each holding data for 30 days in a daily index, so that was quite a lot of shards for my poor little elastic box. (30 * 4 * 5), and it was taking about 30 minutes to restart as it was allocating all the shards!

I've changed the index template used to just use 1 shard per index now, and elastic seems to be much faster :smiley:
As a side effect I have noticed in the xpack monitoring my Data value seems to have dropped. When I had 5 shards it was around 30GB, now I am around 25GB. (approx 60,000,000 documents, seems to be about what I usually have).

Is that what would be expected, or do I need to hunt around for where I have lost 5GB of data...


(David Pilato) #2

IMO it makes sense.

If we take as an example the inverted index, you have something which is like:

term doc id
a 1,2,3
b 1,2,3,4,5

And you have that structure per shard. So in shard 0, you can have:

term doc id
a 1
b 1


term doc id
a 2
b 2


term doc id
a 3
b 3


term doc id
b 4


term doc id
b 5

But with one single shard you have:

term doc id
a 1,2,3
b 1,2,3,4,5

a and b are not duplicated 5 times anymore. adding to this that the doc_id structure is highly optimized. Plus compression on store fields which is probably more efficient.

So I'd expect some gain.

My 0.05 cents

(system) #3

