Every node for itself

Ozgur_Orhan · August 29, 2012, 11:07am

Hello,

We have a problem that repeats itself every 5-12 hours period. When
everything running smoothly (cluster is green) 1 node behaves
irrational and every other node creates its own cluster (not 1/4
split, 1/1/1/1/1 split). This cluster mainly used for training so we
have heavy traffic spikes on both reads and writes when jobs are
triggered (also some continious small reads).

What happened to btrainer-1.138 ?
Even if 1 node (btrainer-1.138) behaves irrationally why didn't the
cluster split by 1/4; why did other nodes lose the master
btrainer-1.182 ?

Setup :

5 similar nodes :

btrainer-1.182	(192.168.1.182)	(Current Master before incident)
btrainer-1.186 (192.168.1.186)
btrainer-1.136	(192.168.1.136)
btrainer-13.137	(192.168.13.137)
btrainer-1.138	(192.168.1.138)

ES Configs :

cluster.name: btrainer
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [ "192.168.1.182:10300",

"192.168.1.186:10300", "192.168.1.136:10300", "192.168.13.137:10300",
"192.168.1.138:10300" ]
http.port: 10200
index.number_of_replicas: 4
transport.tcp.port: 10300

Java Options :

-Des-foreground=yes
-Des.path.home=/elasticsearch
-Xms4096m
-Xmx20480m
-Djline.enabled=true
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-cp /elasticsearch/lib/*:/elasticsearch/lib/sigar/*
org.elasticsearch.bootstrap.ElasticSearch

you can check the logs from the nodes : https://gist.github.com/3510448

Best Regards,
Ozgur Orhan

--

Ozgur_Orhan · August 29, 2012, 11:15am

Forgot to add, we are using version : 0.19.8 .

On Wed, Aug 29, 2012 at 2:07 PM, Özgür Orhan ozgurorhan@gmail.com wrote:

Hello,

We have a problem that repeats itself every 5-12 hours period. When
everything running smoothly (cluster is green) 1 node behaves
irrational and every other node creates its own cluster (not 1/4
split, 1/1/1/1/1 split). This cluster mainly used for training so we
have heavy traffic spikes on both reads and writes when jobs are
triggered (also some continious small reads).

What happened to btrainer-1.138 ?

Even if 1 node (btrainer-1.138) behaves irrationally why didn't the
cluster split by 1/4; why did other nodes lose the master
btrainer-1.182 ?

Setup :
    5 similar nodes :

    btrainer-1.182  (192.168.1.182) (Current Master before incident)
    btrainer-1.186 (192.168.1.186)
    btrainer-1.136  (192.168.1.136)
    btrainer-13.137 (192.168.13.137)
    btrainer-1.138  (192.168.1.138)
ES Configs :
    cluster.name: btrainer
    discovery.zen.ping.multicast.enabled: false
    discovery.zen.ping.unicast.hosts: [ "192.168.1.182:10300",
"192.168.1.186:10300", "192.168.1.136:10300", "192.168.13.137:10300",
"192.168.1.138:10300" ]
http.port: 10200
index.number_of_replicas: 4
transport.tcp.port: 10300

Java Options :
    -Des-foreground=yes
    -Des.path.home=/elasticsearch
    -Xms4096m
    -Xmx20480m
    -Djline.enabled=true
    -XX:+UseParNewGC
    -XX:+UseConcMarkSweepGC
    -XX:+CMSParallelRemarkEnabled
    -XX:SurvivorRatio=8
    -XX:MaxTenuringThreshold=1
    -XX:CMSInitiatingOccupancyFraction=75
    -XX:+UseCMSInitiatingOccupancyOnly
    -cp /elasticsearch/lib/*:/elasticsearch/lib/sigar/*
    org.elasticsearch.bootstrap.ElasticSearch
you can check the logs from the nodes : https://gist.github.com/3510448

Best Regards,
Ozgur Orhan

--

Topic		Replies	Views
Nodes continuously leaving and rejoining the cluster in 7.1 cluster after master switch Elasticsearch	8	1990	October 15, 2020
Node evicted itself? Elasticsearch	6	828	July 6, 2017
Nodes restarting automatically Elasticsearch	23	1478	July 6, 2017
Cluster Split Brain Elasticsearch	5	745	July 6, 2017
Cluster down after an autoreboot? Elasticsearch	5	575	March 8, 2018

Every node for itself

Related topics