I'll soon start a new cluster with latest 6.x series, here are some of the key points:
- one index
- will use two "types" because I need the
join
capability - will have ~22mio parents and ~240mio children documents
- parent documents have more fields and will be "bigger" in general
- the overall size of the indexed will be ~300GB initially
- number of nodes isn't yet decided
- I would classify the growth as "slow but linear"
- update frequency is multiple times per second and it's likely that the update to existing documents will be in the same ratio to the insertion of completely new documents
I understand I will need to run benchmarks any way but upfront would like to get a feeling what an appropriate shard size could be. The default of 5
doesn't sound good to me, having a single shard size of ~60GB etc.
Any suggestion where to start? Are 50 shards ȧ 6GB too many of them? I understand this can speed up indexing but will slow down searching as it has to wait on results of 50 shards.
If someone has experience with a similar sized set up and would like to share the insights, that would be great.
thanks,
- Markus