Is there Disk Limit on Warm/Hot Nodes

Hi All,

We are running Hot/Warm nodes for time series data. Hot Nodes are on SSD and Warm are on Spinning disks both with 64GB Ram where 30G is allocated as ES heap size. ES version is 5.6.x. Is there any limit of data storage on these nodes particularly on Warm Nodes . Based on our current Infra, we are storing around 6-8TB of Data on each Hot Node and around 18-20TB Data on each Warm Node.

Does ES poses any limitations or provide recommendation on optimal Disk Storage for Each Type of Nodes before we consider increasing the disk or increasing the nodes.

There are watermarks as per https://www.elastic.co/guide/en/elasticsearch/reference/6.2/disk-allocator.html.

@warkolm thanks for your reply. I am not referring about water markers. Let me rephrase my question:
Can I have warm nodes with es data size of 20TB each assuming my HDD space is around 40TB/node

There is no defined limit. The amount of data you can store in open indices on a node will depend heavily on the use-case as well as the version of Elasticsearch in use, so it is hard to give a generic answer. When you are dealing with high-density nodes, it often comes down to how much you can optimise heap usage, as this is generally a finite resource. There are a number of things that consume heap in a cluster that is use-case dependent:

  • As described inthis blog post, each shard consumes a certain amount of heap space. Larger shards and segments typically consume less per data volume that smaller shards and segments. How well you can optimise this will therefore have an impact.
  • The mappings used for data can also drive heap usage. text mappings used for free-text search, parent-child relations and nested documents e.g. result in heap usage.
  • You also need to leave some heap for querying. How much depends on what type of queries you run and whether the data nodes also acts as coordinating nodes or not. Frequency and query volume naturally also play into this.

Thanks for your reply.

Below is the extract from the blog you have shared in your post. As per this If I am on ES 5.6.x version with 30G heap with 750 shards with each 25G, then i can roughly store 18TB of index data(time series) as per the recommendations.

Blockquote
TIP: The number of shards you can hold on a node will be proportional to the amount of heap you have available, but there is no fixed limit enforced by Elasticsearch. A good rule-of-thumb is to ensure you keep the number of shards per node below 20 to 25 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600-750 shards, but the further below this limit you can keep it the better. This will generally help the cluster stay in good health.

As it depends on the use-case, in the end this is something you will need to test/benchmark. The blog post provide some general guidelines regarding a recommended maximum number of shards on a node. This will depend on your shard size and how much overhead you have, and your real maximum could very well end up being lower. Note that it is not saying that you should be able to have 750 shards irrespective of shard size.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.