Hello,
We have a problem that repeats itself every 5-12 hours period. When
everything running smoothly (cluster is green) 1 node behaves
irrational and every other node creates its own cluster (not 1/4
split, 1/1/1/1/1 split). This cluster mainly used for training so we
have heavy traffic spikes on both reads and writes when jobs are
triggered (also some continious small reads).
- What happened to btrainer-1.138 ?
- Even if 1 node (btrainer-1.138) behaves irrationally why didn't the
cluster split by 1/4; why did other nodes lose the master
btrainer-1.182 ?
Setup :
5 similar nodes :
btrainer-1.182 (192.168.1.182) (Current Master before incident)
btrainer-1.186 (192.168.1.186)
btrainer-1.136 (192.168.1.136)
btrainer-13.137 (192.168.13.137)
btrainer-1.138 (192.168.1.138)
ES Configs :
cluster.name: btrainer
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [ "192.168.1.182:10300",
"192.168.1.186:10300", "192.168.1.136:10300", "192.168.13.137:10300",
"192.168.1.138:10300" ]
http.port: 10200
index.number_of_replicas: 4
transport.tcp.port: 10300
Java Options :
-Des-foreground=yes
-Des.path.home=/elasticsearch
-Xms4096m
-Xmx20480m
-Djline.enabled=true
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-cp /elasticsearch/lib/*:/elasticsearch/lib/sigar/*
org.elasticsearch.bootstrap.ElasticSearch
you can check the logs from the nodes : https://gist.github.com/3510448
Best Regards,
Ozgur Orhan
--
Forgot to add, we are using version : 0.19.8 .
On Wed, Aug 29, 2012 at 2:07 PM, Özgür Orhan ozgurorhan@gmail.com wrote:
Hello,
We have a problem that repeats itself every 5-12 hours period. When
everything running smoothly (cluster is green) 1 node behaves
irrational and every other node creates its own cluster (not 1/4
split, 1/1/1/1/1 split). This cluster mainly used for training so we
have heavy traffic spikes on both reads and writes when jobs are
triggered (also some continious small reads).
- What happened to btrainer-1.138 ?
- Even if 1 node (btrainer-1.138) behaves irrationally why didn't the
cluster split by 1/4; why did other nodes lose the master
btrainer-1.182 ?
Setup :
5 similar nodes :
btrainer-1.182 (192.168.1.182) (Current Master before incident)
btrainer-1.186 (192.168.1.186)
btrainer-1.136 (192.168.1.136)
btrainer-13.137 (192.168.13.137)
btrainer-1.138 (192.168.1.138)
ES Configs :
cluster.name: btrainer
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [ "192.168.1.182:10300",
"192.168.1.186:10300", "192.168.1.136:10300", "192.168.13.137:10300",
"192.168.1.138:10300" ]
http.port: 10200
index.number_of_replicas: 4
transport.tcp.port: 10300
Java Options :
-Des-foreground=yes
-Des.path.home=/elasticsearch
-Xms4096m
-Xmx20480m
-Djline.enabled=true
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-cp /elasticsearch/lib/*:/elasticsearch/lib/sigar/*
org.elasticsearch.bootstrap.ElasticSearch
you can check the logs from the nodes : https://gist.github.com/3510448
Best Regards,
Ozgur Orhan
--