When using Index Life Cycle Management, how to limit queries to hot indexes

Hello,

I'm using ILM to divide my time series data into hot, warm, cold indexes. The hot indexes use more expensive hardware. Warm is read-only and force-merged. And cold is frozen and contains data older than 60 days.

Now following the documents I have an alias that points to all of these indexes from cold to hot. The index names are all numbered so just by looking at the names, I won't know which ones are cold etc. But obviously I want to mainly query the hot indexes and sometimes warm and very rarely cold.

  1. Is there a way for the ILM to create other aliases specifically for hot, warm and cold indexes?
  2. Let's say I have a timestamp field and I query the current alias to limit the docs to the ones from the past few days. Again the current alias is for all these indexes from cold to hot. Considering that the timestamp for frozen cold indexes is older than 60 days, it still hits those indexes to see if it can find any docs in them that have timestamps of the past few days, right?

Thanks,
Sep

1 Like

I'm exactly at this point, i tried osing only_nodes preference, but it doesnt work because shards must exist phisically on the server you want to query.

if you do:
myindex_alias/_search?preference=_only_nodes:hot*

you will get an error because the indices with alias exist in nodes holding "cold" indices so your query will fail.

you could try setting up a cronjob that creates this, or multiple template matching to create alias dinamically, but i havent find a way yet.

if anyone else have info on this, it would be appreciated.

Daniel

Hi @seperman and @acv2

Is this a theoretical discussion or are you having actual performance issue? I ask this because Elasticsearch is very efficient with time range based queries when the time range is part of the Filter Context even across index patterns with many indexes as part of ILM.

In short the Filter Context is very fast and filters in / out the documents before the query context.

Perhaps take a look at this, Query and Filter Context . I would suggest to start with this, and then if you have performance issues then work on additional techniques.

is an actual issue, because of the architecture of ES of trying to reach out all the shards/indices.
even though the filter context is super efficient still brings performance issues, cold indices would be allocated in different hardware and data would be much more compressed if one of those nodes goes down or hangs, i dont want my queries to even been sent to those nodes, but elasticsearch always will sent queries to those nodes and will always WAIT for those nodes, you could always froze the index and only send queries to those indices when manually requested. This is the solution im going for but i would love to have more control over the ILM rotation, somethin to add special alias to defined type of nodes, i would love to be able to do the shrink operation at the cold phase instead of the warm phase etc.

that being said, yeah the main problem with letting ES decide through the filter context which nodes to execute the query is bullsh$t, it will always send the request and wait for the response of those nodes which completely kills the idea of having HWC architecture

Hi @acv2 thanks for the clarification, I only asked as we sometime have folks that are just getting started over engineering the solution, in your case the extra context helps.

Yes I think it is more of an issue with cold (very dense but not froze) / but not frozen (which you indicated will not be searched by default as you noted) , for Hot / Warm it is an immediate turn around if the index does not have the correct range.

I don' think there is a simple solution today, an index pattern that only includes the Hot or Hot / Warm is a possible suggestion but as you noted that takes some work. I will poke around a bit more see what I can find out.

There is a lot of ongoing work with respect to ILM and you are definitely welcome to file and Issue / enhancement request in the repo. I did a quick look and did not see and enhancement / issues that fits your request.

This doesn't fit my expectations. If your queries include a time range that exclude all cold indices then they should be rewritten to none-match queries on those indices very efficiently (i.e. without IO) and should therefore return very quickly from those nodes. Are you saying that this isn't the case? If so, can you provide more details of how you determined this?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.