We have requirement to index around 8TB data per day including replica( 4TB per day)
We are planning for 12 nodes cluster each with 8 core, 30TB Hdd,64gb ram out of 5 will be master nodes with SSD.
Do we need to use jbod or raid? As we have replica jbod is sufficient please correct us if we are wrong?
We have one logstash instance with 16 core,64 GB ram,5 TB Hdd .Each index with 2 primary shards and 1replica.Is that correct configuration for moderate querying.One logstash instance is sufficient or do we need to use redis or kafka for fault tolerance.
Please let us know elastic and logstash configuration is proper.
Indexing in Elasticsearch is very I/O intensive, so for best performance it is recommended to use SSDs. If you are indexing into spinning disks it is important that you spread out the indexing load across as many disks as possible, e.g. by striping them.
It is difficult to determine exactly how much data a node can index and store, so I would recommend you perform some benchmarks if you have the hardware available. I would also recommend the following resources around sizing and best practices: