Elasicsearc multi rack configuration

Hi,
we have a production cluster composed by 8 nodes (4 with node.rack= location1 and 4 with node.rack= location2 ) .
We have 3 elegible master nodes per each rack (node.master=true) and discovery.zen.minimum_master_nodes: 2 . Can somebody please tell me if this configuration is correct ? I am assuming that the minimum_master_nodes should be calculated basing on the master elegible nodes of the rack (3 nodes) , am I wrong ?
We are also facing some issues when some nodes or the network are gone for a while, for example the last one :
uct/elasticsearch/logs/elasticsearch.log.2018-08-20:[2018-08-20 23:55:50,107][WARN ][discovery.zen ] [25522] received a cluster state from a different master then the current one, rejecting (received {1888}{-_Pj-sURTY6Zr58UiNv0oA}{XX.XX.XX.XX}{XX.XX.XX.XX:9300}{rack=location2 , master=true}, current {31160}{ChtQq9oiSKqfr_NlzbCQsw}{XX.XX.XX.XX}{XX.XX.XX.XX:9300}{rack=location1, master=true})
/product/elasticsearch/logs/elasticsearch.log.2018-08-20:java.lang.IllegalStateException: cluster state from a different master than the current one, rejecting (received {1888}{-_Pj-sURTY6Zr58UiNv0oA}{XX.XX.XX.XX}{XX.XX.XX.XX:9300}{rack=location2, master=true}, current {31160}{ChtQq9oiSKqfr_NlzbCQsw}{XX.XX.XX.XX}{XX.XX.XX.XX:9300}{rack=location1, master=true})
I this related to a wrong configuration?
Any other suggestion for changing the current architecture in order to achieve:

  1. location failover (1 location completely down with the cluster still healthy)
  2. node failover ( 1 node completely down with the cluster still healthy)

Thanks a lot for your help
Best Regards
Mauro

I forgot to say that we are using elasticsearch 2.4.4
Thanks
Best Regards

No, it is based on the total number of master-eligible nodes in the cluster, so in your case it should be set to 4 (although I would probably remove one master node per rack and set it to 3 instead).

If you update the settings a described above, a single node failing should not be a problem.

Being able to handle a full location failing with the cluster staying available and not requiring manual intervention does require a third location (at least for a dedicated master node). You can not achieve this with just 2 locations.

Thanks a lot for your quick response.
I have also to say that our index configuration is :
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open index1 1 7 23946 96 17.7mb 2.2mb
green open index2 1 7 4124802 842760 10.4gb 1.2gb
green open index3 1 7 0 0 1.2kb 159b
So we are replicating all data to all nodes, are your statements still valid ?
Also is there any way to achieve the location failover without adding a new location ? In this case we just need that the cluster is able to perform read operations and that after the networking issue the nodes are able to re join automatically .
Thanks again
Bye
Mauro

That you should already have with 2 locations. If a full location goes down, you will be able to continue serving reads, it is just that inserts and updates can not be allowed as this could lead to data loss.

thanks , and when the networking issue is gone the nodes will be able to rejoin automatically ?
By summarizing you suggest to set:

  1. two master nodes per rack instead of three
  2. discovery.zen.minimum_master_nodes= 3
    By doing that we will achieve :
  3. node proper faiolver
  4. location failover (but only for reads)
  5. nodes will automatically rejoin to the cluster when the issue is gone
  6. avoid any split brain situation
    Are these conclusions true?
    Thanks again
    Best Regards
    Mauro

That sounds correct as you have enough replicas to ensure both racks hold at least one copy of every shard.

do you know why the nodes were not able to rejon automatically to the cluster after the issue ?
Thanks
Best Regards

Did you have minimum_master_nodes set to the correct value at the time?

no it was the wrong one , so 2

Then you probably ended up with a split-brain scenario.

Thanks, we did some tests and we can confirm that we definately ended up with a split-brain (as reported by you ) with the current configuration and this is the reason why the nodes were not able to rejoin to the whole cluster.
We also tested your suggested configuration :

  1. two master nodes per rack instead of three
  2. discovery.zen.minimum_master_nodes= 3
    It works fine if the network between the 2 locations(racks) is working fine.
    But if the network is down the cluster is completely unresponsive , for example the status:
    [root@~]# curl -i "http://localhost:9200/_cluster/health?pretty"
    HTTP/1.1 503 Service Unavailable
    content-type: application/json; charset=UTF-8
    content-length: 228

{
"error" : {
"root_cause" : [
{
"type" : "master_not_discovered_exception",
"reason" : null
}
],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}
In the elasticsearch.log we also see the error "not enough master nodes discovered during pinging (found [[]], but needed [3]), pinging again"
Below our elasticsearch.yml file :

======================== Elasticsearch Configuration =========================

NOTE: Elasticsearch comes with reasonable defaults for most settings.

Before you set out to tweak and tune the configuration, make sure you

understand what are you trying to accomplish and the consequences.

The primary way of configuring a node is via this file. This template lists

the most important settings you may want to configure for a production cluster.

Please consult the documentation for further information on configuration options:

https://www.elastic.co/guide/en/elasticsearch/reference/index.html

---------------------------------- Cluster -----------------------------------

Use a descriptive name for your cluster:

Rotterdam configuration to be used during upgrade: step 1

cluster.name: elasticsearch

------------------------------------ Node ------------------------------------

Use a descriptive name for the node:

node.name: aapps292

Add custom attributes to the node:

#node.attr.rack: r1
node.attr.rack: LOCATION1

cluster.routing.allocation.awareness.attributes: rack

#Only one node per datacenter must be pure data node
node.master: true

node.data: true

----------------------------------- Paths ------------------------------------

Path to directory where to store the data (separate multiple locations by comma):

#path.data: /path/to/data

Path to log files:

#path.logs: /path/to/logs

----------------------------------- Memory -----------------------------------

Lock the memory on startup:

bootstrap.memory_lock: true
bootstrap.system_call_filter: false

Make sure that the heap size is set to about half the memory available

on the system and that the owner of the process is allowed to use this

limit.

Elasticsearch performs poorly when the system is swapping the memory.

---------------------------------- Network -----------------------------------

Set the bind address to a specific IP (IPv4 or IPv6):

network.host: 0.0.0.0
#network.host: 10.133.76.32

transport.host: 10.133.76.32

Set a custom port for HTTP:

http.port: 9200

For more information, consult the network module documentation.

--------------------------------- Discovery ----------------------------------

Pass an initial list of hosts to perform discovery when new node is started:

The default list of hosts is ["127.0.0.1", "[::1]"]

discovery.zen.ping.unicast.hosts: ["ELKNODE01", "ELKNODE02", "ELKNODE03", "ELKNODE04", "ELKNODE05", "ELKNODE06", "ELKNODE07", "ELKNODE08"]
#discovery.zen.ping.unicast.hosts: ["ELKNODE05", "ELKNODE06", "ELKNODE07", "ELKNODE08"]

Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):

discovery.zen.minimum_master_nodes: 3
discovery.zen.fd.ping_timeout: 30s

For more information, consult the zen discovery module documentation.

---------------------------------- Gateway -----------------------------------

Block initial recovery after a full cluster restart until N nodes are started:

#gateway.recover_after_nodes: 3
gateway.expected_nodes: 3

For more information, consult the gateway module documentation.

---------------------------------- Various -----------------------------------

Require explicit names when deleting indices:

#action.destructive_requires_name: true
script.inline: true
script.stored: true
script.file: true
action.destructive_requires_name: true
thread_pool.bulk.queue_size: 300
thread_pool.search.size: 8
thread_pool.search.queue_size : 3000
indices.recovery.max_bytes_per_sec: 100Mb
cluster.routing.allocation.node_concurrent_recoveries: 10

I thought, basing on your previous response, that the read operations should work in case of location faiover but they are not working in our case , maybe I miss something ?
Could you please help ?
Thanks a lot
M

Hello ,
could you pleaee give me a response?
Thanks a lot

You should be able to search data, but I am not sure which of the system APIs that actually requires an elected master to be present. The cluster will be in red state, so it will not be fully available. To get that you will need 3 racks/zones.

So do you know why the cluster is not available also for reads?

Are you not able to perform a search for data (not system APIs)?

I only tried curl cluster status command and curl xget for searching data. Are these system api?

Does curl -X GET "localhost:9200/*/_search?q=*&pretty" return results? The behaviour can be controlled through this setting.

I tried this one and didn't work:
curl -XGET '10.23.80.64:9200/indexname/_search?pretty' -H 'Content-Type: application/json' -d'

{..

}

Did you try the one I gave as an example? If so, what was the output?