When do you need more then 1 shard?

javadevmtl · May 25, 2015, 11:55pm

So I tested my data on a single shard and it seems to perform adequately. Would single shard affect parallelism or doing ops in parallel like bulk indexing and search at same time?

I have 4 physical machines with 32 cores each and ES_HEAP = 30gb each with 5tb ssds each

warkolm · May 26, 2015, 12:56am

Search and indexing are handled via different threadpools, so a single shard won't really matter in that respect.

Adding more shards will help use those cores though.

javadevmtl · May 26, 2015, 3:15pm

Yeah just wondering because right now I'm doing Index per day with 4 shards + 1 replica which equals 1460 shards per node if I'm right? And thats taking up lots of ram.

Wondering if I should reduce it 1 shard + 1 replica the indices will be spread throught the cluster so that will help utilise those cores too, no?

javadevmtl · May 26, 2015, 3:20pm

And I have data retention policy of 3 years. So thinking maybe going monthly index, but how many shards is big question because it will take me for ever to fill a month worth of data.

warkolm · May 26, 2015, 11:42pm

If you don't have much data then moving to weekly or monthly indices may make more sense than dropping to one shard.

This is really something you need to test with your own data and node sizing.

javadevmtl · May 27, 2015, 2:08am

3 billion per year. I'm testing monthly now with 8 shards + 1 replica.

If the math is right...

16 shards (includding replica) x 36 months / 4 nodes = 144 shards per node.

Seems to perform well.

warkolm · May 27, 2015, 2:50am

Why not 4 shards?
Though 8 is good if you expect to extend your cluster.

javadevmtl · May 27, 2015, 1:42pm

Yes I planned a bit of expansion next year.

I also have quite large documents.

Monthly with 8 shards right now it's 51gb per shard and it's not a full month yet. That's half a month.

warkolm · May 28, 2015, 7:12am

We do recommend keeping shard sizes below 50GB, that's a soft limit but larger starts to make re-allocation and recovery difficult.

javadevmtl · May 28, 2015, 3:18pm

Isssh lol right now monthly index with 4 shards + 1 replica is almost 250GB per shard hehe

Also trying 8 shards per month but it's not full yet. So I'll check. But i assume it will be half of the above.

3 billion documents / 8 million docs/per day = 375 shards
Average document size is about 4k-8k depending on document type.
I need to find right balance between shards and indices and ram. but it takes for ever to index that many documents.

Camilo_Sierra · May 28, 2015, 3:54pm

you have a retention of 3 years but you continue to made search querys in this week index ? if the answer is no, you can close the index, close index only consume disk space. and you can open and close a index really easy! personally i let 8 weeks open and if the customer need old information (scroll) i open the n index... https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-open-close.html

javadevmtl · May 28, 2015, 5:06pm

Yep I know about closing index. But good point also! Didn't think of that In that way.

Topic		Replies	Views
Need advice on shards for my index Elasticsearch	15	944	September 30, 2020
If you have capacity, is 1 index with 5 shards better than 5 indices with 1 shard each? Elasticsearch	7	143	May 1, 2024
Trying to optimize Elasticsearch cluster Elasticsearch	3	980	February 20, 2017
SSD and one replica vs HDD and more replicas Elasticsearch	10	3070	July 5, 2017
Sharding and Performance Elasticsearch	1	316	August 29, 2018

When do you need more then 1 shard?

Related topics