High Availablity on Elastic Search

Hi ,

I have installed and configured Elastic-search (Multi Node) Cluster on RHEL 6.9.

Master : 192.168.2.79

Data-Node1 : 192.168.2.80

Data-Node2 : 192.168.2.81

How I make ES high available ??

Question 1 : If Data node 2 goes down here , what will happen? How will I show all data ?

Question 2 : If Storage is full under datanode1 and datanode2, then how Can I increase storage without hampering the system ??

Question 3 : If we add a extra datanode3 when datanode1 and datanode2 space is full , What will happen??

Thanks

You need to have at least 3 master-eligible nodes on separate hosts. This will allow 2 nodes to reach majority even if one node is unavailable assuming you have configured minimum_mster_nodes correctly. You should therefore make your data nodes also master eligible.

Assuming you have a replica configured for all indices, the cluster will still be able to serve data.

You should monitor disk space and act before it gets full as indices will be made read-only and/or you may suffer from index corruption and data loss.

Elasticsearch will automatically redistribute data across all the data nodes available in the cluster.

1 Like

Hi @Christian_Dahlqvist,

Thanks for your valuable response.

In question 1, if we configure replica, the storage will be huge. Suppose we have 5 nodes cluster . In general my cluster size is 50 TB without replica configuration. If we configured the replica the cluster size will be 50 x 5 = 250 TB (Approx) , is this a feasible solution ?
Correct me if I am wrong with the replica concept. Is there any other way to achieve nodes fail-over condition?

In question 2, Suppose we have 90% full storage and we want to add extra storage with out hampering the cluster. Do we need to backup all data directory and increase the storage ?

In question 3, Suppose we have 2 data nodes and in each node path.data size is 15 TB.
In the above scenario, If We add datanode3 , it will be automatically redistribute data and make each data node size is 10 TB??
Correct me if I am wrong.

Thanks,

If your primary shards take up 50TB of storage, configuring 1 replica will double this to 100TB. If you do not have at least 1 replica configured you can not have high availability.

Assuming you have a replica configured, you should be able to take down and modify/upgrade one node at a time while leaving the cluster operable.

Yes, that is basically correct.

If you have 100TB of data you are likely to need more than 2 data nodes. Elasticsearch nodes can not hold an infinite amount of data as the amount of heap available limits this. Exactly how much a node can hold will depend on the use case. Have a look at the following resources:

1 Like

Hi @Christian_Dahlqvist

Thanks for your valuable response.

In Replication, If we have set number_of_replicas : 1 , Is there any chance to slow down data insertion speed. We ingests data through logstash.

Thanks,

Replication can slow down ingest as the same data need to be indexed twice, but that is the price to pay for increased availability and resilience.

1 Like

Hi @Christian_Dahlqvist

Thanks for your valuable response.

Can we implement high availability through storage side using SAN or anything?

Thanks,

Different Elasticsearch nodes need different copies of the data as they are managed separately, so using a SAN will not reduce storage requirements.

1 Like

Hi @Christian_Dahlqvist

Can you please suggest how I configure Elasticsearch 5 server cluster on shared storage.

Thanks,

Hi @Christian_Dahlqvist,

My concern is , I want 5 data nodes cluster and data will be mounted same path like /es-data/ in our storage. Is this possible ? then How ?

Thanks,

What type of shared storage?

Hi @Christian_Dahlqvist,

SSD storage tier
A single RAID5 storage pool:
12 * 200GB EFD
250GB LUN for parent images
500GB LUN for infrastructure
75GB LUNs for replica stores (1 per node pool cluster)

Thanks,

How much data are you looking to store in the cluster? The size of that storage correlates badly with the 50-100TB you used as an example...

Do you know if this includes SANs with de-duplication? I have wondered if I could get redundancy on the application side without effecting space much

Yes, each Elasticsearch node need its own storage.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.