I'm currently testing to store all kind of data on an elastic cluster. I start to dimension my cluster with a minimal safest configuration 3 nodes and my index with 3 shards and 2 replicas.
It's working fine but I was wondering if it's not too much according to my data right now. I have two kind of data:
- twitter input (from logstash) with a daily index
- time series with a daily index
For twitter (with my keywords) one index is around :
- 13 000 documents
- 26mb of storage size
For time series, one index is around:
- 300 documents
- 500 kb of storage size
I discover on different sources that one shard should not exceeded 1M documents and 50GB, but is there a minimal set also to speed query performance. Should I can use 3 shards - 2 replicas per index even if I have not a lot of data per days ?
I would like to keep the "backup" configuration thanks to replicas, but also have the best performance in query search according to my data size. I think over-dimension will slow my search.