Hello,
We're going to put in place a write heavy cluster for logs storage using logstash.
So, I saw that it's recommended to increase the indexing buffer. But I'm a bit puzzled about the fact that this space is divided between all shards on a node.
I don't remember where, but I saw somewhere that this space is divided between active shards, where an active shard is one with at least one indexing operation in the last 5 minutes or so.
Depending on the validity of this, it's quite a gamechanger for my calculations.
Given the fact that, we have about 10 different logs sources (about to increase) with each an index per day, with 5 shards and 2 replicas each, for 60 days retention. This gives a quite big number of shards to maintain and to split this memory on our 5 nodes.
For short, the true questions are:
- What is an active shard?
- Are old indices, 1 days ago and latter, consuming space of the indexig buffer space?
I hope that :
- active shards are those with an indexing activity in the last X minutes.
- indexing buffer space is really divided between those
It would make sense....
In the same settings. If we specify indices.memory.min_shard_index_buffer_size:
- is it a hard minimum of memory given to all shards? including those inactive?
- if we have a lot of shards, can this sum bust something like heap?
Specs:
Our estimates is to have 10k -20k events per seconds on a 5 nodes cluster.
Starting with 16 GB of heap on a 64 GB server, about to increase to 28 GB if our benchmarks suggest that.
Thanks
Bruno Lavoie