Configure High Availability setup for elastic stack

Hi Elastic Experts,

I would appreciate your inputs on the queries below.

We have a requirement to configure high availability across two data centers (DC and DR) for Elasticsearch nodes, Logstash, and Kibana. The customer’s application is deployed in an Active-Active architecture with a load balancer, and data is synchronized between both sites. Our primary objective is log monitoring and analysis, with a daily ingestion volume of 15–20 GB. Additionally, the customer does not want to delete any data.

Given this requirement, is it feasible to deploy a single Elasticsearch node in DC and another in DR for high availability along with one Logstash instance in each location with load balancing for high availability, and a similar setup for Kibana (one in DC and one in DR with load balancing)?

Thanks in advance.

Eshwar

This is generally not realistic. You can aim to keep the data around for a long time, but unless you have unlimited storage and hardware there will need to be a limit. Even though a single node can handle a lot of data, there are practical limits.

Do both data centers need to hold a full copy of all data originating from both data centers?

How are you ingesting or planning to ingest data into Elasticsearch? What is feeding data to the Logstash instances you mentioned?

Hi @Christian_Dahlqvist ,

Thank you for your response.

This is a banking domain, and although I’ve asked several times about data retention, they have clearly stated that no data should be deleted.

Yes, both data centers are required to maintain a full copy of all data.

I am planning to ingest the data using a syslog integration, as the application supports sending logs via syslog.

If we go with the single node, how about performance for long duration?

Regards,

Eshwar

Thanks for the additional information. You can indeed setup two separate clusters, one in each data center. If you do this you have two main points to address:

  1. How do you reliable and in a timely fashion ensure both clusters hold all data from both data centers?
  2. How do you size and configure each cluster to ensure adequate performance and ability to hold enough data?

If we start with point 1 you can solve this by either have the data in each data center be written to the local cluster and then have Elasticsearch replicate this data to the other cluster. This approach has the benefit of the data collection pipeline in each data center only having to write to one location, which is relatively easy to set up and maintain. The potential problem with this approach is that cross-cluster replication, which is required in order to have Elasticsearch manage the replication of the data, is a feature that requires a commercial license and is not available with the free basic tier.

Another option, if a commercial license is not an option, would be to ensure all data is reliably written to both clusters in parallel. In order to make this reliable and able to handle failures or temporary connectivity issues without losing data it is common to introduce a message queue with two consumers, one for each data center cluster. This way the consumers can ingest data independently and a temporary interruption for one does not affect the other or hold up data ingestion completely.

Let’s switch to point 2. A single node cluster can hold a lot of data, but how much will depend on a number of factors:

  • Size and specification of node in terms of CPU and RAM.
  • Type and size of storage used. Elasticsearch can be quite I/O intensive and works best with SSDs. Exactly how much data a node can handle will generally depend on the requirements around query latency when accessing the data. The more data a node holds and the slower the storage is the worse query performance is likely to be. How you query the data and build dashboards will also play a part here.

If we assume 20GB in total is ingested every day and that the data takes up the same amount of space on disk (simplified assumption) the node will need to store 7.3TB in order to hold the data generated during 1 year. That is doable but quite a lot of data for a single node, so if your retention is “indefinite” I suspect you will need to scale beyond a single node reasonably quickly.

If you need to keep data forever but not have it immediately available one option might be to define a shorter retention period of e.g. 6 months and back up older data in snapshots that can be partially restored on demand.