I have following use case , please suggest if I am in the right direction .
Daily Data volume :- 140 GB
Total ES Clusters :- 7, this means I will be having total 7 different clusters (ES1 to ES7) and in each cluster I will be having 20 GB Daily data volume , which will come to 140 GB in total.
Query Type :- Mostly it will be dashboards on non Analyzed data .Like aggregated data , top sources , destinations , ports etc.
I am planning to use
3 nodes , master/data eligible Linux 64 bit architecture
CPU:- 4
Core :- 2
RAM :- 64 GB
I will be dedicating 8 GB RAM to each cluster . With this I will be consuming 56GB RAM as I have 7 cluster on these machines .
I hope I am clear with my use case .Can anyone give me some suggestions if I need something else here ?
I will be having 7 different scenarios which I do not want to club in one cluster , so one dedicated cluster to one scenario + I will be having 7 different Kibana Views as per cluster .
So each cluster get 3 nodes , each with 8GB RAM and 4GB heap, across the 3 hosts?
Absolutely Correct Christian
What type of storage do you have? How long will you be keeping data in the cluster?
So my Total Disk will be approx 2.5 TB as I want to hold data for 3 months (90 days). Which will hold 20GB x 90days=1800GB per cluster and for 7 clusters it will be 1800GB x 7= 12.6 TB total . I also use one Replica so I need 25 TB.
After 90 days with Curator I will close the index or I can shift it to warm node .
Now Since I want to hold data for 90 days there will be trending as well in Kibana.
If I am not mistaken, that is 1800GB, not 180GB. This means that the total data volume, if we assume indexed size on disk is the same as the raw data volume, is about 25TB with replica configured.
Ahh!! My Bad , you are correct it is 180GB(modified my ANS too) Actually size is not a problem I can increase it to 25 TB as well .
So, I will not use message field + 90% of the data will be not analyzed, but still index size will be increased as I will use one replica .
If that is the volume that need to be indexed and queried, I suspect you have too little CPU and you may also need more RAM. As you will be indexing into a lot of clusters and indices, you may also be limited by disk performance, especially if you are planning to have spinning disks.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.