Blocking the communication between 2 ElasticSearch severs in 4 nodes cluster leads to split brain


(moti.umansky) #1

We have a 4 node cluster.

discovery.zen.minimum_master_nodes is 3.

When the master is intes1, and we block the communication between intes1
and intes3, we are getting a split brain.

C:\Procedures\New\ElasticSearch>curl
"intes1:9200/_cluster/health?pretty=true"
{
"cluster_name" : "play_clust1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 4,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

C:\Procedures\New\ElasticSearch>curl
"intes2:9200/_cluster/health?pretty=true"
{
"cluster_name" : "play_clust1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 4,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

C:\Procedures\New\ElasticSearch>curl
"intes3:9200/_cluster/health?pretty=true"
{
"cluster_name" : "play_clust1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 4,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

C:\Procedures\New\ElasticSearch>curl
"intes4:9200/_cluster/health?pretty=true"
{
"cluster_name" : "play_clust1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 4,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

C:\Procedures\New\ElasticSearch>
C:\Procedures\New\ElasticSearch>
C:\Procedures\New\ElasticSearch>C:\Procedures\New\ElasticSearch\curl.exe
"intes1:9200/_cat/master"
WQugEpLIQ3OL6qJGXibQWA INTES1 100.10.122.88 intes1

C:\Procedures\New\ElasticSearch>C:\Procedures\New\ElasticSearch\curl.exe
"intes2:9200/_cat/master"
WQugEpLIQ3OL6qJGXibQWA INTES1 100.10.122.88 intes1

C:\Procedures\New\ElasticSearch>C:\Procedures\New\ElasticSearch\curl.exe
"intes3:9200/_cat/master"
WQugEpLIQ3OL6qJGXibQWA INTES1 100.10.122.88 intes1

C:\Procedures\New\ElasticSearch>C:\Procedures\New\ElasticSearch\curl.exe
"intes4:9200/_cat/master"
WQugEpLIQ3OL6qJGXibQWA INTES1 100.10.122.88 intes1

C:\Procedures\New\ElasticSearch>

17:31 -> blocked communication between intes1 and intes3

C:\Procedures\New\ElasticSearch>curl
"intes1:9200/_cluster/health?pretty=true"
{
"cluster_name" : "play_clust1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 4,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

C:\Procedures\New\ElasticSearch>curl
"intes2:9200/_cluster/health?pretty=true"
{
"cluster_name" : "play_clust1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

C:\Procedures\New\ElasticSearch>curl
"intes3:9200/_cluster/health?pretty=true"
{
"cluster_name" : "play_clust1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

C:\Procedures\New\ElasticSearch>curl
"intes4:9200/_cluster/health?pretty=true"
{
"cluster_name" : "play_clust1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

C:\Procedures\New\ElasticSearch>C:\Procedures\New\ElasticSearch\curl.exe
"intes1:9200/_cat/master"
WQugEpLIQ3OL6qJGXibQWA INTES1 100.10.122.88 intes1

C:\Procedures\New\ElasticSearch>C:\Procedures\New\ElasticSearch\curl.exe
"intes2:9200/_cat/master"
TLU8vz2_SKmLCEIP0DQdeQ INTES3 100.10.122.90 intes3

C:\Procedures\New\ElasticSearch>C:\Procedures\New\ElasticSearch\curl.exe
"intes3:9200/_cat/master"
TLU8vz2_SKmLCEIP0DQdeQ INTES3 100.10.122.90 intes3

C:\Procedures\New\ElasticSearch>C:\Procedures\New\ElasticSearch\curl.exe
"intes4:9200/_cat/master"
TLU8vz2_SKmLCEIP0DQdeQ INTES3 100.10.122.90 intes3

I'll attach :

  • elasticsearch.yml
  • The ES logs from all the cluster nodes
  • The output of curl -XGET "intes1:9200/_nodes/?pretty=true"

Thanks,
Moti

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8673895f-888d-4c92-81fb-e7aa8a45f653%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

This looks similar to

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 18 August 2014 00:55, moti.umansky@gmail.com wrote:

We have a 4 node cluster.

discovery.zen.minimum_master_nodes is 3.

When the master is intes1, and we block the communication between intes1
and intes3, we are getting a split brain.

C:\Procedures\New\ElasticSearch>curl
"intes1:9200/_cluster/health?pretty=true"
{
"cluster_name" : "play_clust1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 4,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

C:\Procedures\New\ElasticSearch>curl
"intes2:9200/_cluster/health?pretty=true"
{
"cluster_name" : "play_clust1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 4,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

C:\Procedures\New\ElasticSearch>curl
"intes3:9200/_cluster/health?pretty=true"
{
"cluster_name" : "play_clust1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 4,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

C:\Procedures\New\ElasticSearch>curl
"intes4:9200/_cluster/health?pretty=true"
{
"cluster_name" : "play_clust1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 4,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

C:\Procedures\New\ElasticSearch>
C:\Procedures\New\ElasticSearch>
C:\Procedures\New\ElasticSearch>C:\Procedures\New\ElasticSearch\curl.exe
"intes1:9200/_cat/master"
WQugEpLIQ3OL6qJGXibQWA INTES1 100.10.122.88 intes1

C:\Procedures\New\ElasticSearch>C:\Procedures\New\ElasticSearch\curl.exe
"intes2:9200/_cat/master"
WQugEpLIQ3OL6qJGXibQWA INTES1 100.10.122.88 intes1

C:\Procedures\New\ElasticSearch>C:\Procedures\New\ElasticSearch\curl.exe
"intes3:9200/_cat/master"
WQugEpLIQ3OL6qJGXibQWA INTES1 100.10.122.88 intes1

C:\Procedures\New\ElasticSearch>C:\Procedures\New\ElasticSearch\curl.exe
"intes4:9200/_cat/master"
WQugEpLIQ3OL6qJGXibQWA INTES1 100.10.122.88 intes1

C:\Procedures\New\ElasticSearch>

17:31 -> blocked communication between intes1 and intes3

C:\Procedures\New\ElasticSearch>curl
"intes1:9200/_cluster/health?pretty=true"
{
"cluster_name" : "play_clust1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 4,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

C:\Procedures\New\ElasticSearch>curl
"intes2:9200/_cluster/health?pretty=true"
{
"cluster_name" : "play_clust1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

C:\Procedures\New\ElasticSearch>curl
"intes3:9200/_cluster/health?pretty=true"
{
"cluster_name" : "play_clust1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

C:\Procedures\New\ElasticSearch>curl
"intes4:9200/_cluster/health?pretty=true"
{
"cluster_name" : "play_clust1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

C:\Procedures\New\ElasticSearch>C:\Procedures\New\ElasticSearch\curl.exe
"intes1:9200/_cat/master"
WQugEpLIQ3OL6qJGXibQWA INTES1 100.10.122.88 intes1

C:\Procedures\New\ElasticSearch>C:\Procedures\New\ElasticSearch\curl.exe
"intes2:9200/_cat/master"
TLU8vz2_SKmLCEIP0DQdeQ INTES3 100.10.122.90 intes3

C:\Procedures\New\ElasticSearch>C:\Procedures\New\ElasticSearch\curl.exe
"intes3:9200/_cat/master"
TLU8vz2_SKmLCEIP0DQdeQ INTES3 100.10.122.90 intes3

C:\Procedures\New\ElasticSearch>C:\Procedures\New\ElasticSearch\curl.exe
"intes4:9200/_cat/master"
TLU8vz2_SKmLCEIP0DQdeQ INTES3 100.10.122.90 intes3

I'll attach :

  • elasticsearch.yml
  • The ES logs from all the cluster nodes
  • The output of curl -XGET "intes1:9200/_nodes/?pretty=true"

Thanks,
Moti

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8673895f-888d-4c92-81fb-e7aa8a45f653%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8673895f-888d-4c92-81fb-e7aa8a45f653%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aEgXy3EJHb97CfDxaTReQd7SjzD4A8%3D5Czz-n9VXkOAA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3