ElasticSerach Sizing

Hi All,

Please comment on below sizing calculation for 10 days HOT node with 3 years retention period.

Does below calculation is correct.. Please provide your suggestions.

EPS 40000
DOC_RAW 0.8KB
DOC_JASON 0.88 KB
Compression Ratio = 30%
DOC_Indexed size = 0.616 KB
DOC_Indexed size(with replica) = 1.232 KB

EPD = 3.965377808 TB
HotNode(10 Days) = 39.74033356TB

Warm Days[3 years] = 1085
Warm storage = 4302.43 TB

Calculated Nodes:
HOT Nodes = 8
Warm Nodes = 258.1460953 (RAM to SSD ratio 1:100)

Server Configuration
Cpu = 32 cores
RAM = 128 GB
SSD = 3.75 RAM to SSD ratio 1:30

It is hard to comment on this given that we know nothing about the use case, requirements or what these numbers are based on.

Hi Christian,

The requirement is to build SIEM solution over ELK stack. Above calculation is done only for ELASTIC SEARCH compute.

Let me know if you need any details.

How did you arrive at those numbers? Did you do any tests or run any benchmarks? What is the hardware specification you plan to deploy these node types on? How many users will you have? What are the query latency requirements?

I would recommend having a look at the following resources:

https://www.elastic.co/webinars/elasticsearch-sizing-and-capacity-planning

https://www.elastic.co/webinars/optimizing-storage-efficiency-in-elasticsearch

https://www.elastic.co/blog/sizing-hot-warm-architectures-for-logging-and-metrics-in-the-elasticsearch-service-on-elastic-cloud

https://www.elastic.co/webinars/using-rally-to-get-your-elasticsearch-cluster-size-right

I have provided the server and log size details in in initial draft.

User Management will be done by kibana...if I am not wrong.

I believe Query latency depends on processing of es cluster

You have not answered any of my questions from the previous post.

Have you run a test to come up with these numbers? If so, did you index at least a few GB? Did you optimize your mappings?

Based on the calculation I take it this is an average EPS. If so, what is the expected peak rate the cluster need to be able to keep up with?

How did you determine that 10 days is the optimal period to keep on the hot nodes. Is this due to query requirements? How did you determine that 8 hot nodes is sufficient?

What is the expected specification of the hot nodes? What type of hardware and storage will you be using?

What is the expected specification of the warm nodes? What type of hardware and storage will you be using?

Here I am looking to see what your users expect and how they will use the cluster. When sizing a cluster it is important to leave enough headroom for querying and not just size the cluster based on the maximum it can index.

1 Like