Hot Cold architecture question?

Dear all
I have a question about hot warm cold architecture.
So basically i want to skip the warm node and just have the hot and the cold node only, is it possible,
and also i hear from a friend that cold node has better compress ratio that hot and warm, is it true because i have a lot of old data and they take up quite alot of space.

Thanks you for your help and time.

@lusynda if you're using ILM then yes, super easy to just define a policy with only hot/cold defined.

Before giving a more indepth answer, some quick background:

  • hot/warm/cold/etc uses a capability called index-level shard allocation filtering which is a way to tell Elasticsearch to put shards of a given index on to specific machine based on an arbitrary label you assign to that machine.
  • per above, "hot" and "cold" are really just arbitrary labels assigned to the ES nodes, so it's totally possible to have a "hot" node that runs way slower than "cold".
  • ILM basically maps the labels you assign (or we do, if running on Elastic Cloud) to your nodes to steps in the lifecycle, like "my-awesome-node" == "hot" or "my-busted-node" == "cold

If you're running your own cluster (not hosted), then "cold" is whatever you want it to be. In other words, it doesn't automatically mean better compression or better anything, really. On Elastic Cloud, the compression ratio doesn't change, but the memory-to-disk ratio does, which will reduce the overall cost of given volume of data.

Somewhat more apropos, ILM will gain the ability to change the compression codec during a phase change in v7.7.0. See: https://github.com/elastic/elasticsearch/pull/49974

Thank you for your respond
So what you are telling me that the cold node has no better compression over data so if i have an index of lets say 500GB on the hot node, it will remain the same 500GB on the cold node?
And if i have old indices that has not been in to an ILM then how will i be able to move them to cold node. I heard that i can use curator to move them but that was in the old version, is the current version still support curator?

So what you are telling me that the cold node has no better compression over data so if i have an index of lets say 500GB on the hot node, it will remain the same 500GB on the cold node?

Correct. Compression ratio doesn't change between lifecycle steps, but the memory-to-disk ratio does, meaning you need less memory (fewer nodes) to serve the same amount of data. Also, cold nodes generally use slower, less expensive disk meaning the cost (not the size) for data on a cold node is less than it would be on hot or warm.

I probably was giving you a deeper answer than you wanted with my first response. I was saying that Elasticsearch does not ship with different types of data nodes. What you do (and what we do in Elastic Cloud) is use different types of hardware in a single cluster, labeling the faster hardware "hot", the slower hardware "slow", and then use ILM to do make the indices move between them and do other optimizations at the same time (like a forced merge)

And if i have old indices that has not been in to an ILM then how will i be able to move them to cold node.

If you all you want to do is have older data with a higher compression ratio than newer data, you'd need to reindex data from your "hot" indices to your "cold" ones, where the cold indices have been configured with "index.codec": "best_compression".

If you've setup shard allocation filtering, then you can move indices from "hot" to "cold" by changing the index routing settings on the hot index when ready for them to move. Curator does these API calls for you, but it isn't required.

1 Like

Sorry for bring this post up this late but my boss has a question that i really dont know how to anwser, So beside the cold node is there any other way for us to archive our old indices and some how compress them even further or iam stuck with the old indices that takes a lot of spaces
Thanks you.

Not that I am aware of as indices already are compressed.

@lusynda
Like Christian said, best compression is best compression. Your only other options are:

  1. Drop those cold indices down to 0 replicas, which will cut your data size in half, but leave you vulnerable to data loss
  2. Pull the data off the cluster via a snapshot and store it in something cheap like an AWS S3 bucket, local tape array, etc. When you need it, you restore back.
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.