Hot-Warm Architecture - Storage Handling

jorj · August 28, 2018, 9:17am

Hello there
I am making a HOT-WARM Elastic Search Design for a logging solution requirement.

My total Storage calculations is: 180 TB for the below requirement

7200 EPS ingestion
350 Bytes / event (average Size)
1x Replica for HA
1x year retention

I will be starting a cluster of 3 Warm-nodes (60 TB Each) for that (including Replica Shards), with the possibility to extend the nodes in the future as per the resource utilization requirements

My 2 questions are:
1- Can I start my Warm-Nodes installation with a smaller disk space, (ex: 20TB), and after a certain time increase the Disk Spaces on the same WARM nodes to reach the 60TB?
(knowing i have a replica configuration on the WARM Nodes as well)

2- Once the data on the WARM nodes are no longer needed to be kept there (Save storage for newer logs), is is possible to move them to a cold / external Storage? If possible, what is the process to access a log/event on theses archived data? (Is there any specific steps to import/activate/search/Deactivate old events)

Thanks,
Jorj ,

Christian_Dahlqvist · August 28, 2018, 9:26am

As explained in this blog post, each shard comes with some amount of overhead in terms of heap usage. Since heap is finite, there is a limit to how much data a node can hold, and this depends on the type of data as well as the mappings. How much data you can store on a node therefore often becomes and exercise in optimising heap usage. You will need to benchmark to see how much data each of your nodes can hold, but in my experience 60TB sounds far too much. I would expect you to require a significantly larger number of warm nodes to handle that data volume.

First check how much a node can handle.

You can use the snapshot API to archive old indices offline.

Also be aware that having very dense nodes for long-term storage can cause a lot of data to need to be redistributed on node failure, which can easily cause problems.

jorj · August 28, 2018, 10:35am

Hello Christian

Thank you for your quick response,

Regarding point 1, even if by doing a shrinking and 1 segment configuration (as per the Hot-Warm best practice), it will still be the same scenario?
Noting that the main requirement of this design is to be able to search old logs (and export the results) based on a Time-range and Source/Destination IP.

It's a CGNAT event log collection

Regards,
Jorj

Christian_Dahlqvist · August 28, 2018, 10:37am

Yes, There is always overhead that you need to consider, although this can vary depending on mappings, shard sizes etc. The only thing that removes this is if you close indices, but that also means that they are no longer searchable and Elasticsearch will also not make sure they are replicated in case of node failures.

system · September 25, 2018, 10:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How much data for a warm node Elasticsearch	2	372	January 10, 2019
Is there Disk Limit on Warm/Hot Nodes Elasticsearch	6	5036	June 13, 2018
“Hot-Warm” Architecture in Elasticsearch best practice Elasticsearch	8	4703	July 24, 2019
Number of Shards and replicas for HOT and WARM architecture Elasticsearch	1	764	July 5, 2017
Realistic Sizing for ElasticSearch Elasticsearch	2	373	March 13, 2020

Hot-Warm Architecture - Storage Handling

Related topics