I am using ELK version 7.5 and the ILM policy I have set up for a fluent-bit index
Fluent bit is managed by a helm-chart with the following config:
backend:
type: es
es:
type: _doc
logstash_format: "Off"
logstash_prefix: ~
index: fluent-bit-write
The ILM policy is as follows
PUT _ilm/policy/fluent-bit
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_age": "24h",
"max_size": "250gb"
}
}
},
"warm": {
"actions": {
"allocate": {
"require": {
"box_type": "warm"
}
},
"forcemerge": {
"max_num_segments": "1"
},
"shrink": {
"number_of_shards": "1"
}
}
},
"delete": {
"min_age": 14d,
"actions": {
"delete": {}
}
}
}
}
}
The fluent-bit template is as follows:
PUT _template/fluent-bit
{
"index_patterns": ["fluent-bit-*"],
"settings": {
"index": {
"lifecycle.name": index,
"lifecycle.rollover_alias": "fluent-bit-write",
"number_of_shards": 4
}
}
}
The first index was created as follows:
PUT %3Cfluent-bit-%7Bnow%2Fd%7D-1%3E
{
"aliases": {
"fluent-bit-write": {}
}
}
The first couple of indices went through the ILM cycle just fine, but recently, I have been noticing that when the rollover condition is met, a new index is created, the new index's rollover_alias is set to fluent-bit-write, the previous index's rollover_alias is set to none. And the previous index fails with index.lifecycle.rollover_alias does not point to index
on the rollover attempt. The error makes sense given that the write alias is no longer pointed at the previous index, but I don't understand why whatever rollover step is happening under the hood occurs after the rollover_alias has been switched. I'm not sure what else needs to be done to address this. I have the same exact setup on a smaller cluster that has a lot fewer shard activities. I am wondering if the general load on the cluster is causing some steps to stall.