Need advice on building a new production ELK cluster

Hello,

We are in the process of building a new ELK cluster, we are looking to incorporate HA and data resiliency in a DR setup (across 2 sites).
We will be using a SAN in each of our 2 sites and will be looking to be ingesting anywhere between 500-700GB and retain the data in the cluster for about 1 year.

We are struggling to determine if we should be using containers or VMs.
Another thing we are trying to understand is if we can make the storage independent from the elasticsearch nodes - ex. if a node fails the data is preserved on the SAN.

In terms of DR, we have also been exploring the different options of using CCR, doing logshipping to the secondary site or simply leveraging the snapshot/restore feature from elasticsearch.

Few of our biggest goals here are: 1) Make it as easy as possible to scale up; 2) Make it as hard as possible to lose data; 3) Make it as easy as possible to restore in case of a DR scenario.

We wanted to see if the community can give us some advise/help us determine which of the above paths we should/shouldn't take.

Thanks!

Do you need the data to be searchable for 1 year or could you use data tier with different kinds of retention, like 15 days for hot data, 60 days for warm data and everything else in snapshots?

Also, to have some resiliency you need at least 1 replica, so 700GB/day with 1 replica will be 1.4 TB/day, to store this for a full year you will need more than 500 TB, which can be really expensive, even more if you want to have this on-premises and have the same infrastructure in multiple sites.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.