Index translog grows past the configured limit

Hi everyone,

I'm new to Elasticsearch and have not been able to find a definite answer to this question: can I set translog flush settings per index or are they only available at the node level?

I have several index templates with the following settings:

"settings": {
"index": {
"refresh_interval": "15s",
"number_of_shards": "2",
"translog": {
"flush_threshold_size": "2048mb",
"sync_interval": "15s",
"flush_threshold_period": "30m",
"durability": "async"
},
"number_of_replicas": "2"
}
},

All other settings (like the number of shards/replicas or the refresh interval) apply correctly but I've seen the transaction log for several indices grow well past the defined limit (and I mean 3-4 times bigger).
I know the translog fsync settings are definitely per index but for some reason this little detail leaves me confused. The "flush_threshold_size" setting seems to work fine if defined in the elasticsearch.yml file.

Are these settings meant to be set only in the node configuration file or do I not understand something about how they work?

You can set this per index, but I think you might be running into a known issue: #15814. Can you take a look at the issue and corresponding PR #15830 and see if that fits your situation?

However, if the file copying phase of the recovery takes >5s
(likely!) or local recovery is slow, the check can run into an exception
and never recover. The end result is that the translog based flush is
completely disabled.

That is probably it, since our installation of ES runs on some pretty slow disks (unfortunately) and the recovery phase is indeed long.
So I guess I'll use the flush API for now as a workaround if this happens again.

Yup. And the aforementioned fix will be released soon.