When do you need more then 1 shard?

(None) #1

So I tested my data on a single shard and it seems to perform adequately. Would single shard affect parallelism or doing ops in parallel like bulk indexing and search at same time?

I have 4 physical machines with 32 cores each and ES_HEAP = 30gb each with 5tb ssds each

(Mark Walkom) #2

Search and indexing are handled via different threadpools, so a single shard won't really matter in that respect.

Adding more shards will help use those cores though.

(None) #3

Yeah just wondering because right now I'm doing Index per day with 4 shards + 1 replica which equals 1460 shards per node if I'm right? And thats taking up lots of ram.

Wondering if I should reduce it 1 shard + 1 replica the indices will be spread throught the cluster so that will help utilise those cores too, no?

(None) #4

And I have data retention policy of 3 years. So thinking maybe going monthly index, but how many shards is big question because it will take me for ever to fill a month worth of data.

(Mark Walkom) #5

If you don't have much data then moving to weekly or monthly indices may make more sense than dropping to one shard.

This is really something you need to test with your own data and node sizing.

(None) #6

3 billion per year. I'm testing monthly now with 8 shards + 1 replica.

If the math is right...

16 shards (includding replica) x 36 months / 4 nodes = 144 shards per node.

Seems to perform well.

(Mark Walkom) #7

Why not 4 shards?
Though 8 is good if you expect to extend your cluster.

(None) #8

Yes I planned a bit of expansion next year.

I also have quite large documents.

Monthly with 8 shards right now it's 51gb per shard and it's not a full month yet. That's half a month.

(Mark Walkom) #9

We do recommend keeping shard sizes below 50GB, that's a soft limit but larger starts to make re-allocation and recovery difficult.

(None) #10

Isssh lol right now monthly index with 4 shards + 1 replica is almost 250GB per shard hehe

Also trying 8 shards per month but it's not full yet. So I'll check. But i assume it will be half of the above.

3 billion documents / 8 million docs/per day = 375 shards
Average document size is about 4k-8k depending on document type.
I need to find right balance between shards and indices and ram. but it takes for ever to index that many documents.

(Camilo Sierra) #11

you have a retention of 3 years but you continue to made search querys in this week index ? if the answer is no, you can close the index, close index only consume disk space. and you can open and close a index really easy! personally i let 8 weeks open and if the customer need old information (scroll) i open the n index... https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-open-close.html

(None) #12

Yep I know about closing index. But good point also! Didn't think of that In that way.

(system) #13