Tuning : index_buffer_size and active shards

Hello,

We're going to put in place a write heavy cluster for logs storage using logstash.

So, I saw that it's recommended to increase the indexing buffer. But I'm a bit puzzled about the fact that this space is divided between all shards on a node.

I don't remember where, but I saw somewhere that this space is divided between active shards, where an active shard is one with at least one indexing operation in the last 5 minutes or so.

Depending on the validity of this, it's quite a gamechanger for my calculations.

Given the fact that, we have about 10 different logs sources (about to increase) with each an index per day, with 5 shards and 2 replicas each, for 60 days retention. This gives a quite big number of shards to maintain and to split this memory on our 5 nodes.

For short, the true questions are:

  • What is an active shard?
  • Are old indices, 1 days ago and latter, consuming space of the indexig buffer space?

I hope that :

  • active shards are those with an indexing activity in the last X minutes.
  • indexing buffer space is really divided between those

It would make sense....

In the same settings. If we specify indices.memory.min_shard_index_buffer_size:

  • is it a hard minimum of memory given to all shards? including those inactive?
  • if we have a lot of shards, can this sum bust something like heap?

Specs:
Our estimates is to have 10k -20k events per seconds on a 5 nodes cluster.
Starting with 16 GB of heap on a 64 GB server, about to increase to 28 GB if our benchmarks suggest that.

Thanks
Bruno Lavoie

Good, response from me to me, and to help others.

Thanks to open source! Without being a programmer, I looked to source code.

The facts are:

  • The indexing buffer is divided between actively indexing shards on the node.

  • An actively indexing shard, is one with an activity in the last 5 minutes.

  • The check classifying actively/inactively indexing shards is performed each 30 seconds.

  • Even if you allocate a big numer to this buffer, there is a hard limit of 512 MB of memory for each shards. Becasue beyond that value, tests proved that it doesn't increase indexing performance.

  • When not actively indexing, shards indexing buffer is set to 500 KB.

Maybe the doc should be updated to be more explicit on these facts.

Main read files:
https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/indices/memory/IndexingMemoryController.java

https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

Bruno Lavoie

2 Likes

Sorry we didn't get back to you earlier! There is some work actively going on in this area and it'll just get better.

If you are willing to sign the CLA then you too can send a documentation update! If you click on the edit link on the page it'll take you to an editor where you can make the change. Or, rather, it should. I promise someone will review it! I can't promise it'll be me because my review queue (open tabs with PRs in them) is pretty deep right now, but someone will!

BTW, you're about 65% of the way to being a programmer now that you can read that source code and make sense out of it.

This is correct. The setting that controls this is index.shard.inactive_time and the default is indeed five minutes.

A shard that has received an indexing operation (index or delete) within the last index.shard.inactive_time time units.

Their indexing buffers are idled down to 500kb. This buffer does not count against the global indexing buffer limit. There is a check every indices.memory.interval time units (defaults to 30 seconds) on the state of all shards, and a check whenever a shard changes from inactive to active. When these checks are executed, shards that have had a state change from active to inactive will be idled down to this 500kb limit, and then the global indexing buffer will be allocated amongst the remaining inactive shards.

Yes. This is five minutes by default.

Yes, only the active shards.

Inactive shard are always idled down to 500kb. The indices.memory.min_shard_index_buffer_size setting is only for active shards.

No. There is global indexing buffer size of indices.memory.index_buffer_size (defaults to 10% of the heap) that is apportioned equally to the active shards. This means that even if you set the maximum to 512 MB, if your heap is 1g, then no single shard will have an indexing buffer large than 102.4 MB.

1 Like

Thanks for your reply, better late than never.

I'm willing to make documentation update, but english is not my native language. I'm an ex-programme, now in the world of databases and analytics. Elasticsearch is simply awesome, but I prefer to understand each details of a datastore before using it in production., and get as ready as possible when troubleshooting is necessary.

Hope that my conclusions aren't bad on theses settings.

Maybe I'll make a pull request in a few days.
Hope to help the community.

Thanks, this makes everything clearer.
Excepted one point regarding min_shard_index_buffer_size.

Well, for the first point, I understand that it's only for actively indexing shards.

For the second, you responded:

Maybe my wording was bad, but my question was about the minimum indexing buffer for each shards. Even with the default 4 MB setting, a huge number of actively indexing shards can exceed and bypass the established index_buffer_size (default 10% of the heap).

Maybe I'm missing something, but this exerpt from IndexingMemoryController doesn't seem to enfore any hard upper limit.

My comments with //@@@@.

int activeShardCount = activeShards.size();

[...]

if (activeShardCount == 0) {
    return;
}

//@@@@ split indexing buffer space evenly between active shards
ByteSizeValue shardIndexingBufferSize = new ByteSizeValue(indexingBuffer.bytes() / activeShardCount);

//@@@@ if the split is inferior to requested minimum (default 4MB), the requested value takes precedence
if (shardIndexingBufferSize.bytes() < minShardIndexBufferSize.bytes()) {
    shardIndexingBufferSize = minShardIndexBufferSize;
}

//@@@@ but always enforce the hard maximum of 512 MB
if (shardIndexingBufferSize.bytes() > maxShardIndexBufferSize.bytes()) {
    shardIndexingBufferSize = maxShardIndexBufferSize;
}

[...]

//@@@@ apply established buffer size
for (IndexShard shard : activeShards) {
    updateShardBuffers(shard, shardIndexingBufferSize, shardTranslogBufferSize);
}

To recall your example:

  • with the case of a 1 GB heap
  • with a 102.4 MB (10% default) indexing buffer
  • with default minimum of 4 MB per shard

With my understanding of the source code, a node with 50 actively indexing shards will allocate 200 MB, roughly twice than the 102.4 MB (10%). And this can grow up without any hard hard limits, say 80% of the heap.

Maybe it's a bad example or extreme case, because most of the time, heaps are way larger. But, I really want to understand all the interractions between these parameters.

So, we just need to be cautious about this fact.
Especially if we have a multi-tenant cluster with a lot of active clients.

Is there:

  • Any way to monitor the current indexing buffer allocated space?
  • Any way to show current actively indexing shards on a node, with their respective buffer size?

If yes, I'll use it to send alerts to our monitoring systems (in extreme cases).
All this, even if we do good planing.

And, are the inactive 500 KB shards indexing buffer accounted in the indexing buffer?

Thanks for your precious time.
Bruno Lavoie

The fault lies with me and I apologize for misunderstanding this question.

You are correct.

Yes, that's part of why this isn't a risky problem. For this to be a problem, the minimum indexing buffer either needs to be set large (relative to the global indexing buffer budget) or there needs to be an incredibly large number of active shards. I would classify both of these as operations mistakes that should just be avoided as they will cause problems in general. But in addition to those two factors, a sufficient number of shards need to be continuously indexing documents (lest those allocated buffers will not fill up, and what is used will eventually be flushed and idled down).

That is, with a sufficiently large minimum buffer size or a sufficiently large number of active shards, the allocated budget can exceed the global indexing buffer budget, but to spill the heap these budgets need to also be in use. This is not a set of circumstances that is at all likely with proper operations management. Let me know if that explanation does not make sense.

These metrics are not exposed through the API, but the indexing buffer reallocations are logged (DEBUG logging level, indices.memory).

These metrics are also not exposed through the API, but are again available in the logs (DEBUG logging level, indices.memory) and when a shard is idled (DEBUG logging level, index.shard).

No, they do not count.

Sincerely, thank you for your thoughtful questions and your interest in Elasticsearch. :slight_smile:

1 Like

Thanks again

IIRC allocation memory on an active shard is à hard limit and this memory is not locked entirely to the shard buffer. Only very active ones that needs to, they will use all allocated budget.

Well done because not all indices have the same incomming throughput.

Definitively, budget is the best word.

Thanks you
Bruno Lavoie

Is there anyway to know current number of active shard through REST API?
I've read that /_cluster/health?pretty
Has an "active_shards" value, but I guess this is not.
Any idea?

1 Like