Logstash and ILM - rollover not working

I please need help to understand ILM in combination with Logstash. My current setup somehow seems not to work as expected. Using ELK 7.9.1 - My configuration looks like:

ILM policy - logstash-syslog-compliance

{ - 
  "logstash-syslog-compliance": { - 
    "version": 2,
    "modified_date": "2020-10-28T06:47:07.463Z",
    "policy": { - 
      "phases": { - 
        "hot": { - 
          "min_age": "0ms",
          "actions": { - 
            "rollover": { - 
              "max_size": "50gb",
              "max_age": "1d"
            },
            "set_priority": { - 
              "priority": 100
            }
          }
        },
        "delete": { - 
          "min_age": "30d",
          "actions": { - 
            "delete": { - 
              "delete_searchable_snapshot": true
            }
          }
        }
      }
    }
  }
}

Before I also had a warm and cold phase, but as I later read this seems not to work if you don't have explicit warm and cold nodes, so I removed the phases and now only have hot and delete. Still yesterday's index didn't get rolled over automatically - described below in detail.

Index Template

{ - 
  "index_templates": [ - 
    { - 
      "name": "logstash-syslog-compliance",
      "index_template": { - 
        "index_patterns": [ - 
          "compliance_logstash_syslog*"
        ],
        "composed_of": [ - 
          "syslog-settings-compliance",
          "syslog-mappings-dynamic",
          "syslog-mappings"
        ],
        "version": 1
      }
    }
  ]
}

Component Template - syslog-settings-compliance

{ - 
  "component_templates": [ - 
    { - 
      "name": "syslog-settings-compliance",
      "component_template": { - 
        "template": { - 
          "settings": { - 
            "index": { - 
              "lifecycle": { - 
                "name": "logstash-syslog-compliance",
                "parse_origination_date": "true",
                "rollover_alias": "compliance_logstash_syslog"
              },
              "routing": { - 
                "allocation": { - 
                  "total_shards_per_node": "1"
                }
              },
              "mapping": { - 
                "total_fields": { - 
                  "limit": "10000"
                }
              },
              "refresh_interval": "15s",
              "number_of_shards": "1",
              "translog": { - 
                "durability": "async"
              },
              "auto_expand_replicas": "false",
              "max_docvalue_fields_search": "200",
              "number_of_replicas": "1"
            }
          }
        },
        "version": 1
      }
    }
  ]
}

Logstash's elasticsearch output - configured on 2 filter-nodes

        elasticsearch {
            id => "elasticsearch_compliance_logstash_syslog"
            # as we have our own template configured we don't want to let logstash manage the template
            manage_template => false
            codec => json_lines
            cacert => "/etc/logstash/certs/elasticsearch-ca.pem"
            # user has cluster permissions: monitor, manage_ilm
            # user has indices permissions: create, write, manage_ilm, delete, create_index, manage
            user => "logstash_internal"
            password => "supersecureandsecret"
            ssl => true
            ilm_enabled => "true"
            ilm_rollover_alias => "compliance_logstash_syslog"
            ilm_pattern => "{now/d}-000001"
            ilm_policy => "logstash-syslog-compliance"
            # obfuscated IPs
            hosts => [ "https://123.123.123.1:9200", "https://123.123.123.2:9200" ]
        }

Problem

So today I saw that the rollover again did not work. I still had an active index compliance_logstash_syslog-2020.10.27-000001. After reading that hot, warm, cold and delete seems not to work without explicitly set nodes (it's only a small test-cluster), I changed the policy to hot and delete only. Still the index didn't get rolled over after waiting half an hour plus (the poll interval is default at 10min). The compliance_logstash_syslog-2020.10.27-000001/_ilm/explain?human showed it's age is 1.27d. The size was definitely not hit - but the age. The "explain" didn't show any errors.
So I tried a manual rollover compliance_logstash_syslog/_rollover/compliance_logstash_syslog-2020.10.28-000001 with the same conditions and the rollover worked without any problem. But now I wonder if it will do the rollover tomorrow to compliance_logstash_syslog-2020.10.29-000001. What's wrong with my setup, what am I missing?

Also I wonder - why is the alias not removed from old indices, imho this is error-prone even with the write-index parameter. It's strange that you have to create the first index with the write-index alias manually. Can I configure that somewhere within a ILM policy to remove aliases from old indices?

Currently I'm not seeing real advantages of ILM compared to curator configs and scripts.

again today no rollover for whatever reason :expressionless:

current explain - age is 1.25d and max-age 1d - why isn't it rolling over???

{ - 
  "indices": { - 
    "compliance_logstash_syslog-2020.10.28-000001": { - 
      "index": "compliance_logstash_syslog-2020.10.28-000001",
      "managed": true,
      "policy": "logstash-syslog-compliance",
      "lifecycle_date": "2020-10-28T00:00:00.000Z",
      "lifecycle_date_millis": 1603843200000,
      "age": "1.25d",
      "phase": "hot",
      "phase_time": "2020-10-28T07:26:37.850Z",
      "phase_time_millis": 1603869997850,
      "action": "rollover",
      "action_time": "2020-10-28T07:28:08.351Z",
      "action_time_millis": 1603870088351,
      "step": "check-rollover-ready",
      "step_time": "2020-10-28T07:28:08.351Z",
      "step_time_millis": 1603870088351,
      "phase_execution": { - 
        "policy": "logstash-syslog-compliance",
        "phase_definition": { - 
          "min_age": "0ms",
          "actions": { - 
            "rollover": { - 
              "max_size": "50gb",
              "max_age": "1d"
            },
            "set_priority": { - 
              "priority": 100
            }
          }
        },
        "version": 2,
        "modified_date": "2020-10-28T06:47:07.463Z",
        "modified_date_in_millis": 1603867627463
      }
    }
  }
}

what the hell?

it rolled over automatically a few minutes ago - but not as expected with a new date :thinking:

the completed one


{ - 
  "indices": { - 
    "compliance_logstash_syslog-2020.10.28-000001": { - 
      "index": "compliance_logstash_syslog-2020.10.28-000001",
      "managed": true,
      "policy": "logstash-syslog-compliance",
      "lifecycle_date": "2020-10-28T00:00:00.000Z",
      "lifecycle_date_millis": 1603843200000,
      "age": "1.35d",
      "phase": "hot",
      "phase_time": "2020-10-28T07:26:37.850Z",
      "phase_time_millis": 1603869997850,
      "action": "complete",
      "action_time": "2020-10-29T07:28:08.707Z",
      "action_time_millis": 1603956488707,
      "step": "complete",
      "step_time": "2020-10-29T07:28:08.707Z",
      "step_time_millis": 1603956488707,
      "phase_execution": { - 
        "policy": "logstash-syslog-compliance",
        "phase_definition": { - 
          "min_age": "0ms",
          "actions": { - 
            "rollover": { - 
              "max_size": "50gb",
              "max_age": "1d"
            },
            "set_priority": { - 
              "priority": 100
            }
          }
        },
        "version": 2,
        "modified_date": "2020-10-28T06:47:07.463Z",
        "modified_date_in_millis": 1603867627463
      }
    }
  }
}

the new one

{ - 
  "indices": { - 
    "compliance_logstash_syslog-2020.10.28-000002": { - 
      "index": "compliance_logstash_syslog-2020.10.28-000002",
      "managed": true,
      "policy": "logstash-syslog-compliance",
      "lifecycle_date": "2020-10-28T00:00:00.000Z",
      "lifecycle_date_millis": 1603843200000,
      "age": "1.35d",
      "phase": "hot",
      "phase_time": "2020-10-29T07:28:08.281Z",
      "phase_time_millis": 1603956488281,
      "action": "rollover",
      "action_time": "2020-10-29T07:38:07.956Z",
      "action_time_millis": 1603957087956,
      "step": "check-rollover-ready",
      "step_time": "2020-10-29T07:38:07.956Z",
      "step_time_millis": 1603957087956,
      "phase_execution": { - 
        "policy": "logstash-syslog-compliance",
        "phase_definition": { - 
          "min_age": "0ms",
          "actions": { - 
            "rollover": { - 
              "max_size": "50gb",
              "max_age": "1d"
            },
            "set_priority": { - 
              "priority": 100
            }
          }
        },
        "version": 2,
        "modified_date": "2020-10-28T06:47:07.463Z",
        "modified_date_in_millis": 1603867627463
      }
    }
  }
}

okay, :roll_eyes:

it seems max_age is just not used by ILM to see when to rollover - see https://github.com/elastic/elasticsearch/issues/57544 - instead the creation_date of the index is used.

now i only want to know why the hell, the new index doesn't have the new date in its name.

Thanks for using Elasticsearch and in particular for your interest in ILM.

why is the alias not removed from old indices, imho this is error-prone even with the write-index parameter. It's strange that you have to create the first index with the write-index alias manually

The rollover on alias behaviour is described in the rollover API documentation:. The alias will be searching over all your managed indices and writes against the alias will be directed towards the latest index (the write index).

Can I configure that somewhere within a ILM policy to remove aliases from old indices?

I don't believe there is an automatic way to do this right now. We have an issue that's proposing this, however we don't know when this is going to be worked on. I recommend +1-ing it if it's something that would help you.

We recognised that the alias setup is error prone so we implemented data streams to reduce the amount of setup needed. I'd recommend considering using data streams.

With regards to the problems you're encountering with the rollover. I appreciate the age in the explain api is a bit confusing (especially in the context of the origination_date being used). We documented it here as the age used to enter the next phase.
You are using parse_origination_date in which case the origination date is used to calculate the age (which is 00:00 hours of the day/date that is parsed from the index name). This is an useful indicator for the phase transition timings, but rollover uses an age that's based on the index creation date (which is available using the _cat API - for eg GET _cat/indices?h=h,s,i,id,p,r,dc,dd,ss,creation.date.string)
I've opened this issue to improve the way we report the index age in the explain API. (given you're ingesting new data do you need the origination_date configuration?)
I believe the delays you're seeing in the index rolling over is because of this age reporting in the exlain api.

With regards to the rolled index name not having the date in the index name incremented, I can't spot from the explain APIs why that went wrong.
Can you see anything in the logs at rollover time ?
You should be able to see something along the lines of

[2020-10-31T16:30:31,683][INFO ][o.e.c.m.MetadataCreateIndexService] [compliance_logstash_syslog-2020.10.31-000005] creating index, cause [rollover_index], templates [logstash-syslog-compliance], shards [1]/[1]

that will indicate what templates were used and quite important if the timestamp of the log statement is on the "next day" (in your case on the 29th of October). Local time tampering would prevent the "real date" from being picked up.

I was not able to reproduce your problem. Here's what I used

Policy

PUT _ilm/policy/logstash-syslog-compliance
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_docs": 1,
            "max_size": "50gb",
            "max_age": "1d"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {
            "delete_searchable_snapshot": true
          }
        }
      }
    }
  }
}

Composable Template

PUT _index_template/logstash-syslog-compliance
{
  "index_patterns": [
    "compliance_logstash_syslog*"
  ],
  "template": {
    "settings": {
      "index": {
        "lifecycle": {
          "name": "logstash-syslog-compliance",
          "parse_origination_date": "true",
          "rollover_alias": "compliance_logstash_syslog"
        },
        "routing": {
          "allocation": {
            "total_shards_per_node": "1"
          }
        },
        "mapping": {
          "total_fields": {
            "limit": "10000"
          }
        },
        "refresh_interval": "15s",
        "number_of_shards": "1",
        "translog": {
          "durability": "async"
        },
        "auto_expand_replicas": "false",
        "max_docvalue_fields_search": "200",
        "number_of_replicas": "1"
      }
    }
  }
}

And the logstash.conf (I removed the security configurations for brevity)

elasticsearch {
		id => "elasticsearch_compliance_logstash_syslog"
			manage_template => false
			ilm_enabled => "true"
			ilm_rollover_alias => "compliance_logstash_syslog"
			ilm_pattern => "{now/d}-000001"
			ilm_policy => "logstash-syslog-compliance"
	} 

I've manipulated the local machine date (configured it to 31st of October) to trigger the rollover and the date in the index name was succesfully updated.

[2020-10-30T16:43:27,037][INFO ][o.e.c.m.MetadataMappingService] [compliance_logstash_syslog-2020.10.30-000001/JR43G_XhSJiki54YKb6nWg] create_mapping [_doc]
[2020-10-31T16:43:37,972][INFO ][o.e.x.i.IndexLifecycleTransition] moving index [compliance_logstash_syslog-2020.10.30-000001] from [{"phase":"hot","action":"rollover","name":"check-rollover-ready"}] to [{"phase":"hot","action":"rollover","name":"attempt-rollover"}] in policy [logstash-syslog-compliance]
[2020-10-31T16:43:38,035][INFO ][o.e.c.m.MetadataCreateIndexService] [compliance_logstash_syslog-2020.10.31-000002] creating index, cause [rollover_index], templates [logstash-syslog-compliance], shards [1]/[1]

Notice on the 31st the rollover was triggered and compliance_logstash_syslog-2020.10.31-000002 was created.

Hope this helps.

thank you very much for your detailed post!

I tried it now again with an one-node elasticsearch vm and one logstash-vm. after setting up the ilm-policy and template I started up logstash, index was created as expected, i did not have to create it manually with the alias and write-index-option.
sending in events created rollovered indices as expected with max_docs: 1. so i changed the date (I just exectued date -s "2020-11-03") on both vms. the rollover happened but still the date was not changed for whatever reason.

after deleting the created indices and restarting logstash I tried again with setting the date with the correct current time and suddenly it worked * yay *

besides, this line in the logstash logs got my attention, while my trials

Creating rollover alias <compliance_logstash_syslog-{now/d}-000001>

I looked through my older logstash logs on my test-elk and ONLY found

Creating rollover alias <compliance_logstash_syslog-{now/d{yyyy-MM-dd}}-000001>

this was from my first tries - but after reading that this only works with the default date-math yyyy.MM.dd I removed it in the logstash-config. I did not find any other line mentioning some rollover alias.

so i shortly stopped logstash on the test-elk, removed now all existing indices from the current ilm-policy and all existing aliases and started logstash again. the new index with today's date was instantly created and the logline now also looks ok. I hope that tomorrow it works correctly - as i cannot just set the date on this system like i want to.

Finally it seems to work.

I assume that somehow the "rollover alias" created by logstash was the culprit, that it didn't get updated somewhere or so after changing the "ilm_pattern".

So stopping logstash, cleaning up the policy (removing the ILM policy from linked indices) and removing the aliases seems to have helped.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.