I have a 10 machine cluster (named es101->es110) with 32GB RAM per machine.
I've allocated 12GB per machine to Elasticsearch. Memory usage on the
machines looks ok, cpu and iowait is also not dramatic, nonetheless the
cluster is frequently becoming instable and losing nodes...
In the logs I am seeing entries like this:
[2014-05-24 10:48:03,336][INFO ][cluster.service ] [es104] master
{new
[es103][Cyvy8BPvRnyUR0EQvCmjMg][es103.muc.domeus.com][inet[/172.16.9.225:9300]]{master=true},
previous
[es102][cDXf0IgzRW2tsMPW4KlbTA][es102.muc.domeus.com][inet[/172.16.9.224:9300]]{master=true}},
removed
{[es102][cDXf0IgzRW2tsMPW4KlbTA][es102.muc.domeus.com][inet[/172.16.9.224:9300]]{master=true},},
added
{[es106][sQklfgSLS_upLZMz2j9O0w][es106.muc.domeus.com][inet[/172.16.9.228:9300]]{master=true},[es108][lgjzCUNUS9CNUOJIWlqlcg][es108.muc.domeus.com][inet[/172.16.9.230:9300]]{master=true},},
reason: zen-disco-receive(from master
[[es103][Cyvy8BPvRnyUR0EQvCmjMg][es103.muc.domeus.com][inet[/172.16.9.225:9300]]{master=true}])
[2014-05-24 10:48:03,423][WARN ][index.shard.service ] [es104]
[logstash-2014.01.01][8] suspect illegal state: trying to move shard from
primary mode to replica mode
[2014-05-24 10:48:03,423][WARN ][index.shard.service ] [es104]
[logstash-2014.05.15][7] suspect illegal state: trying to move shard from
primary mode to replica mode
[2014-05-24 10:48:03,423][WARN ][index.shard.service ] [es104]
[logstash-2014.01.13][8] suspect illegal state: trying to move shard from
primary mode to replica mode
[2014-05-24 10:48:03,423][WARN ][index.shard.service ] [es104]
[logstash-2014.01.13][9] suspect illegal state: trying to move shard from
primary mode to replica mode
[2014-05-24 10:48:03,424][WARN ][index.shard.service ] [es104]
[logstash-2014.05.13][8] suspect illegal state: trying to move shard from
primary mode to replica mode
[2014-05-24 10:48:03,424][WARN ][index.shard.service ] [es104]
[logstash-2014.01.17][7] suspect illegal state: trying to move shard from
primary mode to replica mode
[2014-05-24 10:48:03,424][WARN ][index.shard.service ] [es104]
[logstash-2014.01.15][9] suspect illegal state: trying to move shard from
primary mode to replica mode
[2014-05-24 10:48:03,424][WARN ][index.shard.service ] [es104]
[logstash-2014.03.24][7] suspect illegal state: trying to move shard from
primary mode to replica mode
[2014-05-24 10:48:03,440][WARN ][index.shard.service ] [es104]
[logstash-2014.05.19][9] suspect illegal state: trying to move shard from
primary mode to replica mode
[2014-05-24 10:48:03,458][WARN ][index.shard.service ] [es104]
[logstash-2014.03.13][7] suspect illegal state: trying to move shard from
primary mode to replica mode
[2014-05-24 10:48:42,801][INFO ][cluster.service ] [es104] master
{new
[es102][cDXf0IgzRW2tsMPW4KlbTA][es102.muc.domeus.com][inet[/172.16.9.224:9300]]{master=true},
previous
[es103][Cyvy8BPvRnyUR0EQvCmjMg][es103.muc.domeus.com][inet[/172.16.9.225:9300]]{master=true}},
removed
{[es106][sQklfgSLS_upLZMz2j9O0w][es106.muc.domeus.com][inet[/172.16.9.228:9300]]{master=true},[es108][lgjzCUNUS9CNUOJIWlqlcg][es108.muc.domeus.com][inet[/172.16.9.230:9300]]{master=true},},
added
{[es102][cDXf0IgzRW2tsMPW4KlbTA][es102.muc.domeus.com][inet[/172.16.9.224:9300]]{master=true},},
reason: zen-disco-receive(from master
[[es102][cDXf0IgzRW2tsMPW4KlbTA][es102.muc.domeus.com][inet[/172.16.9.224:9300]]{master=true}])
[2014-05-24 10:48:42,841][WARN ][index.shard.service ] [es104]
[logstash-2014.05.21][1] suspect illegal state: trying to move shard from
primary mode to replica mode
This is what the process looks like on one of the machines which has left
the cluster:
106 21437 114 51.3 473672116 16938980 ? SLl May21 4339:31
/usr/local/java/bin/java -Xms12g -Xmx12g -Xss256k -Djava.awt.headless=true
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError -XX:CMSInitiatingOccupancyFraction=85
-Xmn1024m -Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid
-Des.foreground=yes -Des.path.home=/usr/share/elasticsearch -cp
:/usr/share/elasticsearch/lib/elasticsearch-1.1.1.jar:/usr/share/elasticsearch/lib/:/usr/share/elasticsearch/lib/sigar/
-Des.default.config=/etc/elasticsearch/elasticsearch.yml
-Des.default.path.home=/usr/share/elasticsearch
-Des.default.path.logs=/DATA1/elasticsearch/log
-Des.default.path.work=/tmp/elasticsearch
-Des.default.path.conf=/etc/elasticsearch -Des.node.name=es104
org.elasticsearch.bootstrap.Elasticsearch
Any ideas what might be going on here, and better still how to remedy it?
I'm running elasticsearch 1.1.1 on debian 7
Cheers,
-Robin-
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a7d3e79e-f95b-4bd3-a38c-0720111d84b7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.