ElasticSearch cannot join cluster


(Sicker) #1

I found the problem with elasticSearch cluster on machine 1 found timeout.
the elastic search log are below:

[2012-07-01 00:10:11,876][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [63911ms] ago, timed out [33911ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [652733]

[2012-07-01 00:10:11,877][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [33912ms] ago, timed out [0ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [652734]

[2012-07-01 00:12:08,150][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [50502ms] ago, timed out [20502ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [652782]

[2012-07-01 00:19:59,034][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [61803ms] ago, timed out [31803ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [652917]

[2012-07-01 00:19:59,034][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [31803ms] ago, timed out [1803ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [652918]

[2012-07-01 00:28:03,589][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [38953ms] ago, timed out [8953ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [653028]

[2012-07-01 00:34:13,351][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [68650ms] ago, timed out [38650ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [653034]

[2012-07-01 00:34:43,747][INFO ][discovery.zen ] [Masaryk, Milos] master_left [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]

[2012-07-01 00:36:34,455][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [38650ms] ago, timed out [8650ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [653035]

[2012-07-01 00:37:43,467][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [218766ms] ago, timed out [188766ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [653036]

[2012-07-01 00:37:43,467][INFO ][cluster.service ] [Masaryk, Milos] master {new [Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]], previous [Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]}, removed {[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]],}, reason: zen-disco-master_failed ([Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]])

[2012-07-01 00:39:44,042][INFO ][discovery.zen ] [Masaryk, Milos] master_left [[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]]], reason [no longer master]

[2012-07-01 00:39:44,043][INFO ][cluster.service ] [Masaryk, Milos] master {new [Masaryk, Milos][c6PfkH6uQ4-ACL4nacXAug][inet[/172.20.118.210:9300]], previous [Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]]}, removed {[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]],}, reason: zen-disco-master_failed ([Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]])

[2012-07-01 00:40:50,896][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [65851ms] ago, timed out [11206ms] ago, action [discovery/zen/fd/ping], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][inet[/172.20.118.215:9300]]], id [653041]

[2012-07-01 00:43:33,677][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [173984ms] ago, timed out [139279ms] ago, action [/cluster/nodes/indices/shard/store/n], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][inet[/172.20.118.215:9300]]], id [653044]

[2012-07-01 00:43:33,677][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [207809ms] ago, timed out [173985ms] ago, action [/gateway/local/started-shards/n], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][inet[/172.20.118.215:9300]]], id [653042]

[2012-07-01 00:44:37,569][WARN ][index.shard.service ] [Masaryk, Milos] [directory][3] suspect illegal state: trying to move shard from primary mode to backup mode

[2012-07-01 00:44:46,508][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 00:45:40,538][INFO ][cluster.service ] [Masaryk, Milos] removed {[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][inet[/172.20.118.215:9300]],}, reason: zen-disco-node_failed([Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][inet[/172.20.118.215:9300]]), reason failed to ping, tried [3] times, each with maximum [30s] timeout

[2012-07-01 00:45:48,543][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 00:58:32,684][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 00:58:45,582][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 01:00:01,147][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 01:00:01,338][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 01:00:01,360][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 01:03:54,205][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]]]

[2012-07-01 01:04:45,344][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]]]

[2012-07-01 01:04:45,345][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]]]

[2012-07-01 01:05:07,703][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][653683][1422] duration [16.7s], collections [1]/[17.5s], total [16.7s]/[25.5s], memory [102mb]->[87.7mb]/[1019.8mb]

[2012-07-01 01:09:09,634][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 01:18:30,490][WARN ][index.engine.robin ] [Masaryk, Milos] [directory][1] failed to flush after setting shard to inactive
org.elasticsearch.index.engine.EngineClosedException: [directory][1] CurrentState[CLOSED]
at org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:779)
at org.elasticsearch.index.engine.robin.RobinEngine.updateIndexingBufferSize(RobinEngine.java:218)
at org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:178)
at org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:279)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

[2012-07-01 01:18:30,491][WARN ][index.engine.robin ] [Masaryk, Milos] [directory][4] failed to flush after setting shard to inactive
org.elasticsearch.index.engine.EngineClosedException: [directory][4] CurrentState[CLOSED]
at org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:779)
at org.elasticsearch.index.engine.robin.RobinEngine.updateIndexingBufferSize(RobinEngine.java:218)
at org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:178)
at org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:279)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

[2012-07-01 01:38:02,862][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][654868][1423] duration [3.4s], collections [1]/[1s], total [3.4s]/[28.9s], memory [104mb]->[104mb]/[1019.8mb]

[2012-07-01 02:37:16,178][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][656829][1425] duration [32.6s], collections [1]/[1.3m], total [32.6s]/[1m], memory [107.1mb]->[91.5mb]/[1019.8mb]

[2012-07-01 03:55:11,063][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][659670][1429] duration [12.8s], collections [1]/[37.9s], total [12.8s]/[1.2m], memory [110.4mb]->[96mb]/[1019.8mb]

[2012-07-01 04:55:08,395][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][661918][1432] duration [17.8s], collections [1]/[17.9s], total [17.8s]/[1.5m], memory [112.5mb]->[96.6mb]/[1019.8mb]

[2012-07-01 05:11:00,493][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][662567][1433] duration [1.5m], collections [1]/[1.5m], total [1.5m]/[3m], memory [113.1mb]->[99.5mb]/[1019.8mb]

[2012-07-01 06:06:28,152][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][664137][1436] duration [54.3s], collections [1]/[54.3s], total [54.3s]/[3.9m], memory [119.1mb]->[105.5mb]/[1019.8mb]

I use elasticsearch 0.19.1 on 4 machine. the weird thing found on elasticsearch-head that attacted with this topic.


(Drew Raines) #2

Sicker wrote:

I found the problem with elasticSearch cluster on machine 1 found timeout.
the elastic search log are below:

[...]

[2012-07-01 00:34:43,747][INFO ][discovery.zen ] [Masaryk, Milos]
master_left
[[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]],
reason [failed to ping, tried [3] times, each with maximum [30s]
timeout]

[...]

[2012-07-01 00:37:43,467][INFO ][cluster.service ] [Masaryk, Milos]
master {new
[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]],
previous
[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]},
removed
{[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]],},
reason: zen-disco-master_failed
([Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]])

[2012-07-01 00:39:44,042][INFO ][discovery.zen ] [Masaryk, Milos]
master_left
[[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]]],
reason [no longer master]

[2012-07-01 00:39:44,043][INFO ][cluster.service ] [Masaryk, Milos]
master {new [Masaryk,
Milos][c6PfkH6uQ4-ACL4nacXAug][inet[/172.20.118.210:9300]],
previous
[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]]},
removed
{[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]],},
reason: zen-disco-master_failed
([Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]])

172.20.118.210 is struggling to communicate with the rest of the
nodes. What is the health and load avg on each node?

-Drew


(Sicker) #3

All node are red and don't see each other, after restart all node, it's now
ok.

I think the network between each node down to long after network come back.
the node is not try to find the cluster again.

On Tuesday, July 3, 2012 7:50:52 PM UTC+7, Sicker wrote:

I found the problem with elasticSearch cluster on machine 1 found timeout.
the elastic search log are below:

[2012-07-01 00:10:11,876][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [63911ms] ago, timed out [33911ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [652733]

[2012-07-01 00:10:11,877][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [33912ms] ago, timed out [0ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [652734]

[2012-07-01 00:12:08,150][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [50502ms] ago, timed out [20502ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [652782]

[2012-07-01 00:19:59,034][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [61803ms] ago, timed out [31803ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [652917]

[2012-07-01 00:19:59,034][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [31803ms] ago, timed out [1803ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [652918]

[2012-07-01 00:28:03,589][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [38953ms] ago, timed out [8953ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [653028]

[2012-07-01 00:34:13,351][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [68650ms] ago, timed out [38650ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [653034]

[2012-07-01 00:34:43,747][INFO ][discovery.zen ] [Masaryk, Milos] master_left [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]

[2012-07-01 00:36:34,455][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [38650ms] ago, timed out [8650ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [653035]

[2012-07-01 00:37:43,467][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [218766ms] ago, timed out [188766ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [653036]

[2012-07-01 00:37:43,467][INFO ][cluster.service ] [Masaryk, Milos] master {new [Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]], previous [Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]}, removed {[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]],}, reason: zen-disco-master_failed ([Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]])

[2012-07-01 00:39:44,042][INFO ][discovery.zen ] [Masaryk, Milos] master_left [[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]]], reason [no longer master]

[2012-07-01 00:39:44,043][INFO ][cluster.service ] [Masaryk, Milos] master {new [Masaryk, Milos][c6PfkH6uQ4-ACL4nacXAug][inet[/172.20.118.210:9300]], previous [Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]]}, removed {[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]],}, reason: zen-disco-master_failed ([Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]])

[2012-07-01 00:40:50,896][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [65851ms] ago, timed out [11206ms] ago, action [discovery/zen/fd/ping], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][inet[/172.20.118.215:9300]]], id [653041]

[2012-07-01 00:43:33,677][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [173984ms] ago, timed out [139279ms] ago, action [/cluster/nodes/indices/shard/store/n], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][inet[/172.20.118.215:9300]]], id [653044]

[2012-07-01 00:43:33,677][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [207809ms] ago, timed out [173985ms] ago, action [/gateway/local/started-shards/n], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][inet[/172.20.118.215:9300]]], id [653042]

[2012-07-01 00:44:37,569][WARN ][index.shard.service ] [Masaryk, Milos] [directory][3] suspect illegal state: trying to move shard from primary mode to backup mode

[2012-07-01 00:44:46,508][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 00:45:40,538][INFO ][cluster.service ] [Masaryk, Milos] removed {[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][inet[/172.20.118.215:9300]],}, reason: zen-disco-node_failed([Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][inet[/172.20.118.215:9300]]), reason failed to ping, tried [3] times, each with maximum [30s] timeout

[2012-07-01 00:45:48,543][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 00:58:32,684][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 00:58:45,582][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 01:00:01,147][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 01:00:01,338][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 01:00:01,360][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 01:03:54,205][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]]]

[2012-07-01 01:04:45,344][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]]]

[2012-07-01 01:04:45,345][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/172.20.118.209:9300]]]

[2012-07-01 01:05:07,703][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][653683][1422] duration [16.7s], collections [1]/[17.5s], total [16.7s]/[25.5s], memory [102mb]->[87.7mb]/[1019.8mb]

[2012-07-01 01:09:09,634][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 01:18:30,490][WARN ][index.engine.robin ] [Masaryk, Milos] [directory][1] failed to flush after setting shard to inactive
org.elasticsearch.index.engine.EngineClosedException: [directory][1] CurrentState[CLOSED]
at org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:779)
at org.elasticsearch.index.engine.robin.RobinEngine.updateIndexingBufferSize(RobinEngine.java:218)
at org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:178)
at org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:279)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

[2012-07-01 01:18:30,491][WARN ][index.engine.robin ] [Masaryk, Milos] [directory][4] failed to flush after setting shard to inactive
org.elasticsearch.index.engine.EngineClosedException: [directory][4] CurrentState[CLOSED]
at org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:779)
at org.elasticsearch.index.engine.robin.RobinEngine.updateIndexingBufferSize(RobinEngine.java:218)
at org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:178)
at org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:279)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

[2012-07-01 01:38:02,862][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][654868][1423] duration [3.4s], collections [1]/[1s], total [3.4s]/[28.9s], memory [104mb]->[104mb]/[1019.8mb]

[2012-07-01 02:37:16,178][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][656829][1425] duration [32.6s], collections [1]/[1.3m], total [32.6s]/[1m], memory [107.1mb]->[91.5mb]/[1019.8mb]

[2012-07-01 03:55:11,063][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][659670][1429] duration [12.8s], collections [1]/[37.9s], total [12.8s]/[1.2m], memory [110.4mb]->[96mb]/[1019.8mb]

[2012-07-01 04:55:08,395][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][661918][1432] duration [17.8s], collections [1]/[17.9s], total [17.8s]/[1.5m], memory [112.5mb]->[96.6mb]/[1019.8mb]

[2012-07-01 05:11:00,493][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][662567][1433] duration [1.5m], collections [1]/[1.5m], total [1.5m]/[3m], memory [113.1mb]->[99.5mb]/[1019.8mb]

[2012-07-01 06:06:28,152][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][664137][1436] duration [54.3s], collections [1]/[54.3s], total [54.3s]/[3.9m], memory [119.1mb]->[105.5mb]/[1019.8mb]

I use elasticsearch 0.19.1 on 4 machine. the weird thing found on elasticsearch-head that attacted with this topic.


(Shay Banon) #4

In 0.19.5 and above we improved the logic of exactly the failure you are
seeing, might make sense to upgrade to the latest version.

On Tue, Jul 3, 2012 at 5:13 PM, Sicker sicker27@gmail.com wrote:

All node are red and don't see each other, after restart all node, it's
now ok.

I think the network between each node down to long after network come
back. the node is not try to find the cluster again.

On Tuesday, July 3, 2012 7:50:52 PM UTC+7, Sicker wrote:

I found the problem with elasticSearch cluster on machine 1 found
timeout. the elastic search log are below:

[2012-07-01 00:10:11,876][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [63911ms] ago, timed out [33911ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [652733]

[2012-07-01 00:10:11,877][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [33912ms] ago, timed out [0ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [652734]

[2012-07-01 00:12:08,150][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [50502ms] ago, timed out [20502ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [652782]

[2012-07-01 00:19:59,034][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [61803ms] ago, timed out [31803ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [652917]

[2012-07-01 00:19:59,034][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [31803ms] ago, timed out [1803ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [652918]

[2012-07-01 00:28:03,589][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [38953ms] ago, timed out [8953ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [653028]

[2012-07-01 00:34:13,351][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [68650ms] ago, timed out [38650ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [653034]

[2012-07-01 00:34:43,747][INFO ][discovery.zen ] [Masaryk, Milos] master_left [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]

[2012-07-01 00:36:34,455][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [38650ms] ago, timed out [8650ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [653035]

[2012-07-01 00:37:43,467][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [218766ms] ago, timed out [188766ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [653036]

[2012-07-01 00:37:43,467][INFO ][cluster.service ] [Masaryk, Milos] master {new [Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]], previous [Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]}, removed {[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]],}, reason: zen-disco-master_failed ([Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]])

[2012-07-01 00:39:44,042][INFO ][discovery.zen ] [Masaryk, Milos] master_left [[Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]]], reason [no longer master]

[2012-07-01 00:39:44,043][INFO ][cluster.service ] [Masaryk, Milos] master {new [Masaryk, Milos][c6PfkH6uQ4-ACL4nacXAug]**[inet[/172.20.118.210:9300]], previous [Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]]}, removed {[Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]],}, reason: zen-disco-master_failed ([Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]])

[2012-07-01 00:40:50,896][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [65851ms] ago, timed out [11206ms] ago, action [discovery/zen/fd/ping], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][**inet[/172.20.118.215:9300]]], id [653041]

[2012-07-01 00:43:33,677][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [173984ms] ago, timed out [139279ms] ago, action [/cluster/nodes/indices/shard/**store/n], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][**inet[/172.20.118.215:9300]]], id [653044]

[2012-07-01 00:43:33,677][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [207809ms] ago, timed out [173985ms] ago, action [/gateway/local/started-**shards/n], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][**inet[/172.20.118.215:9300]]], id [653042]

[2012-07-01 00:44:37,569][WARN ][index.shard.service ] [Masaryk, Milos] [directory][3] suspect illegal state: trying to move shard from primary mode to backup mode

[2012-07-01 00:44:46,508][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 00:45:40,538][INFO ][cluster.service ] [Masaryk, Milos] removed {[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][**inet[/172.20.118.215:9300]],}, reason: zen-disco-node_failed([Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][**inet[/172.20.118.215:9300]]), reason failed to ping, tried [3] times, each with maximum [30s] timeout

[2012-07-01 00:45:48,543][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 00:58:32,684][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 00:58:45,582][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 01:00:01,147][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 01:00:01,338][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 01:00:01,360][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 01:03:54,205][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]]]

[2012-07-01 01:04:45,344][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]]]

[2012-07-01 01:04:45,345][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]]]

[2012-07-01 01:05:07,703][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][653683][1422] duration [16.7s], collections [1]/[17.5s], total [16.7s]/[25.5s], memory [102mb]->[87.7mb]/[1019.8mb]

[2012-07-01 01:09:09,634][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 01:18:30,490][WARN ][index.engine.robin ] [Masaryk, Milos] [directory][1] failed to flush after setting shard to inactive
org.elasticsearch.index.**engine.EngineClosedException: [directory][1] CurrentState[CLOSED]
at org.elasticsearch.index.**engine.robin.RobinEngine.**flush(RobinEngine.java:779)
at org.elasticsearch.index.**engine.robin.RobinEngine.**updateIndexingBufferSize(**RobinEngine.java:218)
at org.elasticsearch.indices.**memory.**IndexingMemoryController$**ShardsIndicesStatusChecker.**run(IndexingMemoryController.**java:178)
at org.elasticsearch.threadpool.**ThreadPool$LoggingRunnable.**run(ThreadPool.java:279)
at java.util.concurrent.**Executors$RunnableAdapter.**call(Executors.java:441)
at java.util.concurrent.**FutureTask$Sync.**innerRunAndReset(FutureTask.**java:317)
at java.util.concurrent.**FutureTask.runAndReset(**FutureTask.java:150)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.access$**101(**ScheduledThreadPoolExecutor.**java:98)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.**runPeriodic(**ScheduledThreadPoolExecutor.**java:180)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.run(**ScheduledThreadPoolExecutor.**java:204)
at java.util.concurrent.**ThreadPoolExecutor$Worker.**runTask(ThreadPoolExecutor.**java:886)
at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.**java:662)

[2012-07-01 01:18:30,491][WARN ][index.engine.robin ] [Masaryk, Milos] [directory][4] failed to flush after setting shard to inactive
org.elasticsearch.index.**engine.EngineClosedException: [directory][4] CurrentState[CLOSED]
at org.elasticsearch.index.**engine.robin.RobinEngine.**flush(RobinEngine.java:779)
at org.elasticsearch.index.**engine.robin.RobinEngine.**updateIndexingBufferSize(**RobinEngine.java:218)
at org.elasticsearch.indices.**memory.**IndexingMemoryController$**ShardsIndicesStatusChecker.**run(IndexingMemoryController.**java:178)
at org.elasticsearch.threadpool.**ThreadPool$LoggingRunnable.**run(ThreadPool.java:279)
at java.util.concurrent.**Executors$RunnableAdapter.**call(Executors.java:441)
at java.util.concurrent.**FutureTask$Sync.**innerRunAndReset(FutureTask.**java:317)
at java.util.concurrent.**FutureTask.runAndReset(**FutureTask.java:150)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.access$**101(**ScheduledThreadPoolExecutor.**java:98)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.**runPeriodic(**ScheduledThreadPoolExecutor.**java:180)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.run(**ScheduledThreadPoolExecutor.**java:204)
at java.util.concurrent.**ThreadPoolExecutor$Worker.**runTask(ThreadPoolExecutor.**java:886)
at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.**java:662)

[2012-07-01 01:38:02,862][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][654868][1423] duration [3.4s], collections [1]/[1s], total [3.4s]/[28.9s], memory [104mb]->[104mb]/[1019.8mb]

[2012-07-01 02:37:16,178][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][656829][1425] duration [32.6s], collections [1]/[1.3m], total [32.6s]/[1m], memory [107.1mb]->[91.5mb]/[1019.8mb]

[2012-07-01 03:55:11,063][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][659670][1429] duration [12.8s], collections [1]/[37.9s], total [12.8s]/[1.2m], memory [110.4mb]->[96mb]/[1019.8mb]

[2012-07-01 04:55:08,395][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][661918][1432] duration [17.8s], collections [1]/[17.9s], total [17.8s]/[1.5m], memory [112.5mb]->[96.6mb]/[1019.8mb]

[2012-07-01 05:11:00,493][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][662567][1433] duration [1.5m], collections [1]/[1.5m], total [1.5m]/[3m], memory [113.1mb]->[99.5mb]/[1019.8mb]

[2012-07-01 06:06:28,152][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][664137][1436] duration [54.3s], collections [1]/[54.3s], total [54.3s]/[3.9m], memory [119.1mb]->[105.5mb]/[1019.**8mb]

I use elasticsearch 0.19.1 on 4 machine. the weird thing found on elasticsearch-head that attacted with this topic.


(Sicker) #5

What is the behaviour of this case in 0.19.5?

On Tuesday, July 3, 2012 11:18:47 PM UTC+7, kimchy wrote:

In 0.19.5 and above we improved the logic of exactly the failure you are
seeing, might make sense to upgrade to the latest version.

On Tue, Jul 3, 2012 at 5:13 PM, Sicker sicker27@gmail.com wrote:

All node are red and don't see each other, after restart all node, it's
now ok.

I think the network between each node down to long after network come
back. the node is not try to find the cluster again.

On Tuesday, July 3, 2012 7:50:52 PM UTC+7, Sicker wrote:

I found the problem with elasticSearch cluster on machine 1 found
timeout. the elastic search log are below:

[2012-07-01 00:10:11,876][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [63911ms] ago, timed out [33911ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [652733]

[2012-07-01 00:10:11,877][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [33912ms] ago, timed out [0ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [652734]

[2012-07-01 00:12:08,150][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [50502ms] ago, timed out [20502ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [652782]

[2012-07-01 00:19:59,034][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [61803ms] ago, timed out [31803ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [652917]

[2012-07-01 00:19:59,034][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [31803ms] ago, timed out [1803ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [652918]

[2012-07-01 00:28:03,589][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [38953ms] ago, timed out [8953ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [653028]

[2012-07-01 00:34:13,351][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [68650ms] ago, timed out [38650ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [653034]

[2012-07-01 00:34:43,747][INFO ][discovery.zen ] [Masaryk, Milos] master_left [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]

[2012-07-01 00:36:34,455][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [38650ms] ago, timed out [8650ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [653035]

[2012-07-01 00:37:43,467][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [218766ms] ago, timed out [188766ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [653036]

[2012-07-01 00:37:43,467][INFO ][cluster.service ] [Masaryk, Milos] master {new [Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]], previous [Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]}, removed {[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]],}, reason: zen-disco-master_failed ([Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]])

[2012-07-01 00:39:44,042][INFO ][discovery.zen ] [Masaryk, Milos] master_left [[Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]]], reason [no longer master]

[2012-07-01 00:39:44,043][INFO ][cluster.service ] [Masaryk, Milos] master {new [Masaryk, Milos][c6PfkH6uQ4-ACL4nacXAug]**[inet[/172.20.118.210:9300]], previous [Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]]}, removed {[Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]],}, reason: zen-disco-master_failed ([Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]])

[2012-07-01 00:40:50,896][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [65851ms] ago, timed out [11206ms] ago, action [discovery/zen/fd/ping], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][**inet[/172.20.118.215:9300]]], id [653041]

[2012-07-01 00:43:33,677][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [173984ms] ago, timed out [139279ms] ago, action [/cluster/nodes/indices/shard/**store/n], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][**inet[/172.20.118.215:9300]]], id [653044]

[2012-07-01 00:43:33,677][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [207809ms] ago, timed out [173985ms] ago, action [/gateway/local/started-**shards/n], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][**inet[/172.20.118.215:9300]]], id [653042]

[2012-07-01 00:44:37,569][WARN ][index.shard.service ] [Masaryk, Milos] [directory][3] suspect illegal state: trying to move shard from primary mode to backup mode

[2012-07-01 00:44:46,508][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 00:45:40,538][INFO ][cluster.service ] [Masaryk, Milos] removed {[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][**inet[/172.20.118.215:9300]],}, reason: zen-disco-node_failed([Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][**inet[/172.20.118.215:9300]]), reason failed to ping, tried [3] times, each with maximum [30s] timeout

[2012-07-01 00:45:48,543][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 00:58:32,684][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 00:58:45,582][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 01:00:01,147][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 01:00:01,338][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 01:00:01,360][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 01:03:54,205][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]]]

[2012-07-01 01:04:45,344][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]]]

[2012-07-01 01:04:45,345][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]]]

[2012-07-01 01:05:07,703][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][653683][1422] duration [16.7s], collections [1]/[17.5s], total [16.7s]/[25.5s], memory [102mb]->[87.7mb]/[1019.8mb]

[2012-07-01 01:09:09,634][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 01:18:30,490][WARN ][index.engine.robin ] [Masaryk, Milos] [directory][1] failed to flush after setting shard to inactive
org.elasticsearch.index.**engine.EngineClosedException: [directory][1] CurrentState[CLOSED]
at org.elasticsearch.index.**engine.robin.RobinEngine.**flush(RobinEngine.java:779)
at org.elasticsearch.index.**engine.robin.RobinEngine.**updateIndexingBufferSize(**RobinEngine.java:218)
at org.elasticsearch.indices.**memory.**IndexingMemoryController$**ShardsIndicesStatusChecker.**run(IndexingMemoryController.**java:178)
at org.elasticsearch.threadpool.**ThreadPool$LoggingRunnable.**run(ThreadPool.java:279)
at java.util.concurrent.**Executors$RunnableAdapter.**call(Executors.java:441)
at java.util.concurrent.**FutureTask$Sync.**innerRunAndReset(FutureTask.**java:317)
at java.util.concurrent.**FutureTask.runAndReset(**FutureTask.java:150)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.access$**101(**ScheduledThreadPoolExecutor.**java:98)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.**runPeriodic(**ScheduledThreadPoolExecutor.**java:180)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.run(**ScheduledThreadPoolExecutor.**java:204)
at java.util.concurrent.**ThreadPoolExecutor$Worker.**runTask(ThreadPoolExecutor.**java:886)
at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.**java:662)

[2012-07-01 01:18:30,491][WARN ][index.engine.robin ] [Masaryk, Milos] [directory][4] failed to flush after setting shard to inactive
org.elasticsearch.index.**engine.EngineClosedException: [directory][4] CurrentState[CLOSED]
at org.elasticsearch.index.**engine.robin.RobinEngine.**flush(RobinEngine.java:779)
at org.elasticsearch.index.**engine.robin.RobinEngine.**updateIndexingBufferSize(**RobinEngine.java:218)
at org.elasticsearch.indices.**memory.**IndexingMemoryController$**ShardsIndicesStatusChecker.**run(IndexingMemoryController.**java:178)
at org.elasticsearch.threadpool.**ThreadPool$LoggingRunnable.**run(ThreadPool.java:279)
at java.util.concurrent.**Executors$RunnableAdapter.**call(Executors.java:441)
at java.util.concurrent.**FutureTask$Sync.**innerRunAndReset(FutureTask.**java:317)
at java.util.concurrent.**FutureTask.runAndReset(**FutureTask.java:150)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.access$**101(**ScheduledThreadPoolExecutor.**java:98)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.**runPeriodic(**ScheduledThreadPoolExecutor.**java:180)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.run(**ScheduledThreadPoolExecutor.**java:204)
at java.util.concurrent.**ThreadPoolExecutor$Worker.**runTask(ThreadPoolExecutor.**java:886)
at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.**java:662)

[2012-07-01 01:38:02,862][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][654868][1423] duration [3.4s], collections [1]/[1s], total [3.4s]/[28.9s], memory [104mb]->[104mb]/[1019.8mb]

[2012-07-01 02:37:16,178][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][656829][1425] duration [32.6s], collections [1]/[1.3m], total [32.6s]/[1m], memory [107.1mb]->[91.5mb]/[1019.8mb]

[2012-07-01 03:55:11,063][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][659670][1429] duration [12.8s], collections [1]/[37.9s], total [12.8s]/[1.2m], memory [110.4mb]->[96mb]/[1019.8mb]

[2012-07-01 04:55:08,395][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][661918][1432] duration [17.8s], collections [1]/[17.9s], total [17.8s]/[1.5m], memory [112.5mb]->[96.6mb]/[1019.8mb]

[2012-07-01 05:11:00,493][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][662567][1433] duration [1.5m], collections [1]/[1.5m], total [1.5m]/[3m], memory [113.1mb]->[99.5mb]/[1019.8mb]

[2012-07-01 06:06:28,152][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][664137][1436] duration [54.3s], collections [1]/[54.3s], total [54.3s]/[3.9m], memory [119.1mb]->[105.5mb]/[1019.**8mb]

I use elasticsearch 0.19.1 on 4 machine. the weird thing found on elasticsearch-head that attacted with this topic.


(Siriwat Jithunsa) #6

Is it this one fixed in 0.19.5? -

On Wed, Jul 4, 2012 at 1:54 PM, Sicker sicker27@gmail.com wrote:

What is the behaviour of this case in 0.19.5?

On Tuesday, July 3, 2012 11:18:47 PM UTC+7, kimchy wrote:

In 0.19.5 and above we improved the logic of exactly the failure you are
seeing, might make sense to upgrade to the latest version.

On Tue, Jul 3, 2012 at 5:13 PM, Sicker sicker27@gmail.com wrote:

All node are red and don't see each other, after restart all node, it's
now ok.

I think the network between each node down to long after network come
back. the node is not try to find the cluster again.

On Tuesday, July 3, 2012 7:50:52 PM UTC+7, Sicker wrote:

I found the problem with elasticSearch cluster on machine 1 found
timeout. the elastic search log are below:

[2012-07-01 00:10:11,876][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [63911ms] ago, timed out [33911ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [652733]

[2012-07-01 00:10:11,877][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [33912ms] ago, timed out [0ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [652734]

[2012-07-01 00:12:08,150][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [50502ms] ago, timed out [20502ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [652782]

[2012-07-01 00:19:59,034][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [61803ms] ago, timed out [31803ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [652917]

[2012-07-01 00:19:59,034][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [31803ms] ago, timed out [1803ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [652918]

[2012-07-01 00:28:03,589][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [38953ms] ago, timed out [8953ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [653028]

[2012-07-01 00:34:13,351][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [68650ms] ago, timed out [38650ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [653034]

[2012-07-01 00:34:43,747][INFO ][discovery.zen ] [Masaryk, Milos] master_left [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]

[2012-07-01 00:36:34,455][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [38650ms] ago, timed out [8650ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [653035]

[2012-07-01 00:37:43,467][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [218766ms] ago, timed out [188766ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]], id [653036]

[2012-07-01 00:37:43,467][INFO ][cluster.service ] [Masaryk, Milos] master {new [Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:**9300]], previous [Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]}, removed {[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]],}, reason: zen-disco-master_failed ([Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]])

[2012-07-01 00:39:44,042][INFO ][discovery.zen ] [Masaryk, Milos] master_left [[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:**9300]]], reason [no longer master]

[2012-07-01 00:39:44,043][INFO ][cluster.service ] [Masaryk, Milos] master {new [Masaryk, Milos][c6PfkH6uQ4-ACL4nacXAug]****[inet[/172.20.118.210:9300]], previous [Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:**9300]]}, removed {[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:**9300]],}, reason: zen-disco-master_failed ([Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:**9300]])

[2012-07-01 00:40:50,896][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [65851ms] ago, timed out [11206ms] ago, action [discovery/zen/fd/ping], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][****inet[/172.20.118.215:9300]]], id [653041]

[2012-07-01 00:43:33,677][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [173984ms] ago, timed out [139279ms] ago, action [/cluster/nodes/indices/shard/****store/n], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][****inet[/172.20.118.215:9300]]], id [653044]

[2012-07-01 00:43:33,677][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [207809ms] ago, timed out [173985ms] ago, action [/gateway/local/started-shards/n], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][****inet[/172.20.118.215:9300]]], id [653042]

[2012-07-01 00:44:37,569][WARN ][index.shard.service ] [Masaryk, Milos] [directory][3] suspect illegal state: trying to move shard from primary mode to backup mode

[2012-07-01 00:44:46,508][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 00:45:40,538][INFO ][cluster.service ] [Masaryk, Milos] removed {[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][****inet[/172.20.118.215:9300]],}, reason: zen-disco-node_failed([Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][****inet[/172.20.118.215:9300]]), reason failed to ping, tried [3] times, each with maximum [30s] timeout

[2012-07-01 00:45:48,543][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 00:58:32,684][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 00:58:45,582][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 01:00:01,147][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 01:00:01,338][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 01:00:01,360][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 01:03:54,205][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:**9300]]]

[2012-07-01 01:04:45,344][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:**9300]]]

[2012-07-01 01:04:45,345][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:**9300]]]

[2012-07-01 01:05:07,703][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][653683][1422] duration [16.7s], collections [1]/[17.5s], total [16.7s]/[25.5s], memory [102mb]->[87.7mb]/[1019.8mb]

[2012-07-01 01:09:09,634][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][T7DyaW6PSuWONZK4NNw3Bw][inet[/172.20.118.208:9300]]]

[2012-07-01 01:18:30,490][WARN ][index.engine.robin ] [Masaryk, Milos] [directory][1] failed to flush after setting shard to inactive
org.elasticsearch.index.engine.EngineClosedException: [directory][1] CurrentState[CLOSED]
at org.elasticsearch.index.engine.robin.RobinEngine.**flush(**RobinEngine.java:779)
at org.elasticsearch.index.engine.robin.RobinEngine.updateIndexingBufferSize(**RobinEngine.**java:218)
at org.elasticsearch.indices.memory.**IndexingMemoryController$ShardsIndicesStatusChecker.**run(**IndexingMemoryController.**java:**178)
at org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.**run(**ThreadPool.java:279)
at java.util.concurrent.Executors$RunnableAdapter.**call(**Executors.java:441)
at java.util.concurrent.FutureTask$Sync.**innerRunAndReset(**FutureTask.**java:317)
at java.util.concurrent.FutureTask.runAndReset(**FutureTask.java:**150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$**101(ScheduledThreadPoolExecutor.**java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.**runPeriodic(ScheduledThreadPoolExecutor.**java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.**java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.**runTask(**ThreadPoolExecutor.**java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

[2012-07-01 01:18:30,491][WARN ][index.engine.robin ] [Masaryk, Milos] [directory][4] failed to flush after setting shard to inactive
org.elasticsearch.index.engine.EngineClosedException: [directory][4] CurrentState[CLOSED]
at org.elasticsearch.index.engine.robin.RobinEngine.**flush(**RobinEngine.java:779)
at org.elasticsearch.index.engine.robin.RobinEngine.updateIndexingBufferSize(**RobinEngine.**java:218)
at org.elasticsearch.indices.memory.**IndexingMemoryController$ShardsIndicesStatusChecker.**run(**IndexingMemoryController.**java:**178)
at org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.**run(**ThreadPool.java:279)
at java.util.concurrent.Executors$RunnableAdapter.**call(**Executors.java:441)
at java.util.concurrent.FutureTask$Sync.**innerRunAndReset(**FutureTask.**java:317)
at java.util.concurrent.FutureTask.runAndReset(**FutureTask.java:**150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$**101(ScheduledThreadPoolExecutor.**java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.**runPeriodic(ScheduledThreadPoolExecutor.**java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.**java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.**runTask(**ThreadPoolExecutor.**java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

[2012-07-01 01:38:02,862][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][654868][1423] duration [3.4s], collections [1]/[1s], total [3.4s]/[28.9s], memory [104mb]->[104mb]/[1019.8mb]

[2012-07-01 02:37:16,178][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][656829][1425] duration [32.6s], collections [1]/[1.3m], total [32.6s]/[1m], memory [107.1mb]->[91.5mb]/[1019.8mb]

[2012-07-01 03:55:11,063][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][659670][1429] duration [12.8s], collections [1]/[37.9s], total [12.8s]/[1.2m], memory [110.4mb]->[96mb]/[1019.8mb]

[2012-07-01 04:55:08,395][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][661918][1432] duration [17.8s], collections [1]/[17.9s], total [17.8s]/[1.5m], memory [112.5mb]->[96.6mb]/[1019.8mb]

[2012-07-01 05:11:00,493][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][662567][1433] duration [1.5m], collections [1]/[1.5m], total [1.5m]/[3m], memory [113.1mb]->[99.5mb]/[1019.8mb]

[2012-07-01 06:06:28,152][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][664137][1436] duration [54.3s], collections [1]/[54.3s], total [54.3s]/[3.9m], memory [119.1mb]->[105.5mb]/[1019.8mb]

I use elasticsearch 0.19.1 on 4 machine. the weird thing found on elasticsearch-head that attacted with this topic.


(Sicker) #7

Hi kimchy,

After I upgrade elasticSearch to 0.19.8 then test I again with this
scenario but the cluster not join together again.

I think it wait for something to trigger the rejoin process.

What do you think in my case?

On Tuesday, July 3, 2012 11:18:47 PM UTC+7, kimchy wrote:

In 0.19.5 and above we improved the logic of exactly the failure you are
seeing, might make sense to upgrade to the latest version.

On Tue, Jul 3, 2012 at 5:13 PM, Sicker <sick...@gmail.com <javascript:>>wrote:

All node are red and don't see each other, after restart all node, it's
now ok.

I think the network between each node down to long after network come
back. the node is not try to find the cluster again.

On Tuesday, July 3, 2012 7:50:52 PM UTC+7, Sicker wrote:

I found the problem with elasticSearch cluster on machine 1 found
timeout. the elastic search log are below:

[2012-07-01 00:10:11,876][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [63911ms] ago, timed out [33911ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [652733]

[2012-07-01 00:10:11,877][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [33912ms] ago, timed out [0ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [652734]

[2012-07-01 00:12:08,150][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [50502ms] ago, timed out [20502ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [652782]

[2012-07-01 00:19:59,034][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [61803ms] ago, timed out [31803ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [652917]

[2012-07-01 00:19:59,034][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [31803ms] ago, timed out [1803ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [652918]

[2012-07-01 00:28:03,589][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [38953ms] ago, timed out [8953ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [653028]

[2012-07-01 00:34:13,351][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [68650ms] ago, timed out [38650ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [653034]

[2012-07-01 00:34:43,747][INFO ][discovery.zen ] [Masaryk, Milos] master_left [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]

[2012-07-01 00:36:34,455][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [38650ms] ago, timed out [8650ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [653035]

[2012-07-01 00:37:43,467][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [218766ms] ago, timed out [188766ms] ago, action [discovery/zen/fd/masterPing], node [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]], id [653036]

[2012-07-01 00:37:43,467][INFO ][cluster.service ] [Masaryk, Milos] master {new [Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]], previous [Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]}, removed {[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]],}, reason: zen-disco-master_failed ([Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]])

[2012-07-01 00:39:44,042][INFO ][discovery.zen ] [Masaryk, Milos] master_left [[Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]]], reason [no longer master]

[2012-07-01 00:39:44,043][INFO ][cluster.service ] [Masaryk, Milos] master {new [Masaryk, Milos][c6PfkH6uQ4-ACL4nacXAug]**[inet[/172.20.118.210:9300]], previous [Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]]}, removed {[Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]],}, reason: zen-disco-master_failed ([Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]])

[2012-07-01 00:40:50,896][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [65851ms] ago, timed out [11206ms] ago, action [discovery/zen/fd/ping], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][**inet[/172.20.118.215:9300]]], id [653041]

[2012-07-01 00:43:33,677][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [173984ms] ago, timed out [139279ms] ago, action [/cluster/nodes/indices/shard/**store/n], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][**inet[/172.20.118.215:9300]]], id [653044]

[2012-07-01 00:43:33,677][WARN ][transport ] [Masaryk, Milos] Received response for a request that has timed out, sent [207809ms] ago, timed out [173985ms] ago, action [/gateway/local/started-**shards/n], node [[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][**inet[/172.20.118.215:9300]]], id [653042]

[2012-07-01 00:44:37,569][WARN ][index.shard.service ] [Masaryk, Milos] [directory][3] suspect illegal state: trying to move shard from primary mode to backup mode

[2012-07-01 00:44:46,508][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 00:45:40,538][INFO ][cluster.service ] [Masaryk, Milos] removed {[Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][**inet[/172.20.118.215:9300]],}, reason: zen-disco-node_failed([Malus, Karl][mHn3YRK0SkCp88yUp2kqQw][**inet[/172.20.118.215:9300]]), reason failed to ping, tried [3] times, each with maximum [30s] timeout

[2012-07-01 00:45:48,543][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 00:58:32,684][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 00:58:45,582][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 01:00:01,147][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 01:00:01,338][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 01:00:01,360][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 01:03:54,205][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]]]

[2012-07-01 01:04:45,344][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]]]

[2012-07-01 01:04:45,345][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Vengeance][**F3kgDi9sTva4DGlw1whhrQ][inet[/**172.20.118.209:9300]]]

[2012-07-01 01:05:07,703][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][653683][1422] duration [16.7s], collections [1]/[17.5s], total [16.7s]/[25.5s], memory [102mb]->[87.7mb]/[1019.8mb]

[2012-07-01 01:09:09,634][WARN ][discovery.zen ] [Masaryk, Milos] master should not receive new cluster state from [[Sunfire][**T7DyaW6PSuWONZK4NNw3Bw][inet[/**172.20.118.208:9300]]]

[2012-07-01 01:18:30,490][WARN ][index.engine.robin ] [Masaryk, Milos] [directory][1] failed to flush after setting shard to inactive
org.elasticsearch.index.**engine.EngineClosedException: [directory][1] CurrentState[CLOSED]
at org.elasticsearch.index.**engine.robin.RobinEngine.**flush(RobinEngine.java:779)
at org.elasticsearch.index.**engine.robin.RobinEngine.**updateIndexingBufferSize(**RobinEngine.java:218)
at org.elasticsearch.indices.**memory.**IndexingMemoryController$**ShardsIndicesStatusChecker.**run(IndexingMemoryController.**java:178)
at org.elasticsearch.threadpool.**ThreadPool$LoggingRunnable.**run(ThreadPool.java:279)
at java.util.concurrent.**Executors$RunnableAdapter.**call(Executors.java:441)
at java.util.concurrent.**FutureTask$Sync.**innerRunAndReset(FutureTask.**java:317)
at java.util.concurrent.**FutureTask.runAndReset(**FutureTask.java:150)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.access$**101(**ScheduledThreadPoolExecutor.**java:98)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.**runPeriodic(**ScheduledThreadPoolExecutor.**java:180)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.run(**ScheduledThreadPoolExecutor.**java:204)
at java.util.concurrent.**ThreadPoolExecutor$Worker.**runTask(ThreadPoolExecutor.**java:886)
at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.**java:662)

[2012-07-01 01:18:30,491][WARN ][index.engine.robin ] [Masaryk, Milos] [directory][4] failed to flush after setting shard to inactive
org.elasticsearch.index.**engine.EngineClosedException: [directory][4] CurrentState[CLOSED]
at org.elasticsearch.index.**engine.robin.RobinEngine.**flush(RobinEngine.java:779)
at org.elasticsearch.index.**engine.robin.RobinEngine.**updateIndexingBufferSize(**RobinEngine.java:218)
at org.elasticsearch.indices.**memory.**IndexingMemoryController$**ShardsIndicesStatusChecker.**run(IndexingMemoryController.**java:178)
at org.elasticsearch.threadpool.**ThreadPool$LoggingRunnable.**run(ThreadPool.java:279)
at java.util.concurrent.**Executors$RunnableAdapter.**call(Executors.java:441)
at java.util.concurrent.**FutureTask$Sync.**innerRunAndReset(FutureTask.**java:317)
at java.util.concurrent.**FutureTask.runAndReset(**FutureTask.java:150)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.access$**101(**ScheduledThreadPoolExecutor.**java:98)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.**runPeriodic(**ScheduledThreadPoolExecutor.**java:180)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**ScheduledFutureTask.run(**ScheduledThreadPoolExecutor.**java:204)
at java.util.concurrent.**ThreadPoolExecutor$Worker.**runTask(ThreadPoolExecutor.**java:886)
at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.**java:662)

[2012-07-01 01:38:02,862][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][654868][1423] duration [3.4s], collections [1]/[1s], total [3.4s]/[28.9s], memory [104mb]->[104mb]/[1019.8mb]

[2012-07-01 02:37:16,178][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][656829][1425] duration [32.6s], collections [1]/[1.3m], total [32.6s]/[1m], memory [107.1mb]->[91.5mb]/[1019.8mb]

[2012-07-01 03:55:11,063][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][659670][1429] duration [12.8s], collections [1]/[37.9s], total [12.8s]/[1.2m], memory [110.4mb]->[96mb]/[1019.8mb]

[2012-07-01 04:55:08,395][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][661918][1432] duration [17.8s], collections [1]/[17.9s], total [17.8s]/[1.5m], memory [112.5mb]->[96.6mb]/[1019.8mb]

[2012-07-01 05:11:00,493][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][662567][1433] duration [1.5m], collections [1]/[1.5m], total [1.5m]/[3m], memory [113.1mb]->[99.5mb]/[1019.8mb]

[2012-07-01 06:06:28,152][WARN ][monitor.jvm ] [Masaryk, Milos] [gc][ParNew][664137][1436] duration [54.3s], collections [1]/[54.3s], total [54.3s]/[3.9m], memory [119.1mb]->[105.5mb]/[1019.**8mb]

I use elasticsearch 0.19.1 on 4 machine. the weird thing found on elasticsearch-head that attacted with this topic.

--


(system) #8