I am trying to calculate the Resource requirements for an ELK system which will be deployed on k8s.
The total load will be 4 TB and i will use 1 replica. Is it possible to have equations for required number of shard and required vCPU, considering that i will use hot shards for all data and will set to 50 GB.
Hi, i check the thread. Let me explain my equation to determine the resources. I have 4 TB data and i am going to use 1 replica so we can say 8TB of data. I will use active shards for indexing and keeping shard size in 50 GB so i need 160 active shards. In the documentations i follow it recommend 1:1.5 shard vCPU ratio for active shards, so it makes 240 vCPU and since i use lots of pod to reach that it automatically increase the RAM and ephemeral storage.
Hi @Christian_Dahlqvist ,
I dont have all the answers for your questions but below you can find my answers. I hope that would be enought to determine a way.
Where does this recommendation come from?
I coudn't recall but as far as remember 1:1.5 ratio recommended for starting point. Checking the system performance and adjust this value are also recommended.
What is the use case?
Store cell phone call data and visualize it.
Is your data immutable or are you performing updates?
It can be updatable
What type of data are you indexing?
Location based information
Numbers and texts
Will you be using time-based indices in some form, e.g. data streams?
Some indices will be time-based.
How are you accessing and querying the data? Kibana? Custom APIs?
Kibana and custom Kibana plugins.
What is the average and peak indexing/update rate?
We have 4 TB of data for indexing within 2 weeks.
How many concurrent queries/searches do you expect to need to support?
N/A
How large portion of the data does each search/query typically target?
It can be 10k with scrolling or any pagination method to compare two sites.
In general, metrics will be calculated with aggregation instead of searching.
What are your latency requirements for queries/searches?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.