Elasticsearch resource calculation

I am trying to calculate the Resource requirements for an ELK system which will be deployed on k8s.

The total load will be 4 TB and i will use 1 replica. Is it possible to have equations for required number of shard and required vCPU, considering that i will use hot shards for all data and will set to 50 GB.

Hi,

I'd advice you to have a look at the answer I posted in this thread.

More details will help in answering, and the resources I linked might help you on the way.

Goodluck!

Hi, i check the thread. Let me explain my equation to determine the resources. I have 4 TB data and i am going to use 1 replica so we can say 8TB of data. I will use active shards for indexing and keeping shard size in 50 GB so i need 160 active shards. In the documentations i follow it recommend 1:1.5 shard vCPU ratio for active shards, so it makes 240 vCPU and since i use lots of pod to reach that it automatically increase the RAM and ephemeral storage.

Where does this recommendation come from?

How you size your cluster will depend on the use case, so you will need to provide more details on this:

What is the use case?

Is your data immutable or are you performing updates?

What type of data are you indexing?

Will you be using time-based indices in some form, e.g. data streams?

How are you accessing and querying the data? Kibana? Custom APIs?

What is the average and peak indexing/update rate?

How many concurrent queries/searches do you expect to need to support?

How large portion of the data does each search/query typically target?

What are your latency requirements for queries/searches?

Hi @Christian_Dahlqvist ,
I dont have all the answers for your questions but below you can find my answers. I hope that would be enought to determine a way.

Where does this recommendation come from?

  • I coudn't recall but as far as remember 1:1.5 ratio recommended for starting point. Checking the system performance and adjust this value are also recommended.

What is the use case?

  • Store cell phone call data and visualize it.

Is your data immutable or are you performing updates?

  • It can be updatable

What type of data are you indexing?

  • Location based information
  • Numbers and texts

Will you be using time-based indices in some form, e.g. data streams?

  • Some indices will be time-based.

How are you accessing and querying the data? Kibana? Custom APIs?

  • Kibana and custom Kibana plugins.

What is the average and peak indexing/update rate?

  • We have 4 TB of data for indexing within 2 weeks.

How many concurrent queries/searches do you expect to need to support?

  • N/A

How large portion of the data does each search/query typically target?

  • It can be 10k with scrolling or any pagination method to compare two sites.
  • In general, metrics will be calculated with aggregation instead of searching.

What are your latency requirements for queries/searches?

  • N/A

Hi @Christian_Dahlqvist,

Do you have any comments with respect to my comment ? It would be great if you could help me on this.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.