Hot nodes full

Hey Peeps,

I have a theoretical question. What happens when you have nodes split by tiers - hot, warm, cold, and you have the indices managed by ILM. In case the hot nodes get filled up to the frozen watermark, would you need to manually move some of the oldest indices from hot to warm to free up space, or would ILM do that automatically? If it doesn't do it automatically, is there some way to automate it by other means?

As an example, let's say you configured ILM to move indices to warm 10 days after creation, yet they run out of space in 5 days.

Thanks!

Hi @lduvnjak

No ILM does not automatically adjust.

In this example at the highest level .

the Hot nodes would fill up and at some point stop ingesting.

You would would need to go in and adjust the ILM policy to say 5 days. Then the indices would start to move to warm nodes and eventually you would start ingesting data again.

2 Likes

Gotcha.

Thanks a bunch Stephen.

One more quick question. What value does Elasticsearch look at when deciding the age of an index, and is this value interchangeable? To give more context on the possible issue we can face I'll dive just a bit into the arhitecture.

We have kafka serving as a mq, and it writes data into Elastic via the confluent connector for Elastic. In case the cluster is down for let's say 2 days, then Kafka will burst the logs from those 2 days that Elastic was offline, and continue to ingest data normally afterwards.

Since the amount of data is no longer consistent we estimate that the hot nodes would no longer be able to hold data for X days, as it would have 3 days worth of data bursted into 1.

Is there some way we can manually change the value that Elastic uses to tell the indices' age so that it is once again consistent with the actual age of the logs? If it's the value of "creation_date", then maybe changing that value to make it so the indices are 2 days old in Elastic's eyes.

Thanks in advance.

In short there are 2 values depending on how you set up. The Index Creation date if you are not using rollover or the rollover date if you are using automated rollover.

I have not tried that and I would not recommend that .... but you are welcome to try, there may be unintended consequences... The dates that govern that are actually in the ILM process / and indices and I would definitely not recommend trying anything that is not in the documents API

What are you trying to solve? In a large logging systems there is almost always catchup / replay etc.. etc.. all logs will not always in the exact "index" with the name and date that is the exact event date of the log. Yes mostly it will be that way... but I have rarely see that perfect at scale. There is a lot of logic that accounts for that on the query side to help optimize...

Basically if you get a "Bump / Buldge" folks often just adjust the ILM policy for a day or to speed the transition to the next phase the put back to normal after that.

There are two dynamic index settings that control this.

index.lifecycle.origination_date
(Dynamic, long) If specified, this is the timestamp used to calculate the index age for its phase transitions. Use this setting if you create a new index that contains old data and want to use the original creation date to calculate the index age. Specified as a Unix epoch value in milliseconds.

And

index.lifecycle.parse_origination_date
(Dynamic, Boolean) Set to true to parse the origination date from the index name. This origination date is used to calculate the index age for its phase transitions. The index name must match the pattern ^.*-{date_format}-\\d+, where the date_format is yyyy.MM.dd and the trailing digits are optional. An index that was rolled over would normally match the full format, for example logs-2016.10.31-000002). If the index name doesn’t match the pattern, index creation fails.

The first one seems to enable you to set a custom origination date to be used in phase changes.

For example, if you have an indice that was created today but want the phases to happen based on 2 days ago, you would set this in your index.

PUT /index-name/_settings
{
  "index" : {
    "lifecycle.origination_date": 1667962800000
  }
}

Where 1667962800 is the epoch time for 2022-11-09 00:00:00.000Z

1 Like

Interesting... I was unaware

index.lifecycle.origination_date

Unclear to me if changing this after the index is created and under ILM will actually affect the ILM process.

Also I don't think this will work if using automated rollover since that does not work off creation dates

Probably worth a test.

Then check

GET index/_ilm/explain

Thanks so much for all the info everyone. I'll test this on monday and let you know about my findings.

Posting my findings here:

I have a test ILM which moves indices into cold after 2 days, and deletes them after 4.

After bootstraping a test index, and manually rolling it over, I tested updating the non-write rolled over index. Here is an example of updating the origination_date of the existing index:

PUT /index-2022.11.11.12-000004/_settings
{ 
  "settings" : { 
    "index" : { 
      "lifecycle.origination_date": 1667991600000
    } 
  } 
}

After updating the origination date so the index is over 2d old, it immediately got moved to the cold tier. Same goes for over 4d old - the index was immediately deleted. It looks like when you update the origination_date, ILM immediately looks at the index, and doesn't wait for the usual poll interval which is great.

I've also stumbled across this blog from Elastic which dives into it a bit more if you want to take a look: Control the phase transition timings in ILM using the origination date | Elastic Blog

Thanks so much for all the help everyone!

1 Like

Just gonna add a bit here. I've also managed to find a way to copy an index with an identical name and integrate it with ILM.

POST _reindex?wait_for_completion=false
{
  "conflicts": "proceed",
  "source": {
    "remote": {
      "host": "https://<elastic_host>:9200",
      "username": "elastic",
      "password": "<Password>",
      "socket_timeout": "10h",
      "connect_timeout": "10h"
    },
    "index": "<index>-2022.11.23.08-000002"
  },
  "dest": {
    "index": "<index>-2022.11.23.08-000002",
    "op_type": "create"
  }
}

Since ILM will complain about the rollover alias not pointing to that new index, you can tell ILM that it's already been "rolloved over".

PUT /<index>-2022.11.23.08-000002/_settings
{
    "index.lifecycle.indexing_complete": true,
    "index.lifecycle.origination_date": 1669191506530
} 

Here are the official docs: Skip rollover | Elasticsearch Guide [8.5] | Elastic