Fewer shards = less disk space used?

Hello

Hopefully a quick and easy question;
Does having fewer shards per index equate to less disk space being used?

When I set off on my elastic adventure a year ago I used the default 5 shards per index. I now have 4 different index 'types'(?), each holding data for 30 days in a daily index, so that was quite a lot of shards for my poor little elastic box. (30 * 4 * 5), and it was taking about 30 minutes to restart as it was allocating all the shards!

I've changed the index template used to just use 1 shard per index now, and elastic seems to be much faster :smiley:
As a side effect I have noticed in the xpack monitoring my Data value seems to have dropped. When I had 5 shards it was around 30GB, now I am around 25GB. (approx 60,000,000 documents, seems to be about what I usually have).

Is that what would be expected, or do I need to hunt around for where I have lost 5GB of data...

Thanks!

IMO it makes sense.

If we take as an example the inverted index, you have something which is like:

term doc id
a 1,2,3
b 1,2,3,4,5

And you have that structure per shard. So in shard 0, you can have:

term doc id
a 1
b 1

Shard1:

term doc id
a 2
b 2

Shard2:

term doc id
a 3
b 3

Shard3:

term doc id
b 4

Shard4:

term doc id
b 5

But with one single shard you have:

term doc id
a 1,2,3
b 1,2,3,4,5

a and b are not duplicated 5 times anymore. adding to this that the doc_id structure is highly optimized. Plus compression on store fields which is probably more efficient.

So I'd expect some gain.

My 0.05 cents

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.