Clustering - removing index from second node in cluster leads to bad state / full sync?


(arimus) #1

I've have a 2 node cluster (configured for 1 replica) running for a
couple weeks. Looks like it started to get unhappy, throwing lots of
the messages below in the logs and eventually throwing lots of the
subsequent exceptions. This all ended in the cluster getting into an
unhappy state (red on node1, yellow on node2) . node2 was listed in
the cluster nodes when querying node1, but there were 2 indexes that
showed they had some recovering shards with no apparent progress and 4
more shards that were unassigned.

I was able to track down that one of our crons actually pruned an old
index from the second node in the cluster as part of clean-up
(logstash-2012.05.29). I'm guessing that ES didn't like the fact that
the index was deleted from that second node and subsequently decided
to ignore that nodes participation in the cluster? I ended up
deleting that same index from node1 and then restarting node2, and the
cluster started recovering with node2 basically getting completely
wiped and sync'ng everything from node1 from scratch. Did node1
decide that node2 was in a bad state due to the index deletion issue
and decide to wipe it completely and re-copy all the indexes/shards to
it? That's a rather expensive behavior, so my question is can it be
avoided?

Thoughts? Log files entries below.

==

The first node in the cluster had these logs (lots of the
UnavailableShardsException, followed by the IndexMissingException).

[2012-06-06 20:01:00,041][WARN ][river.rabbitmq ]
[sl02.mail01] [rabbitmq][logstash-n05_mail01] failed to executefailure
in bulk execution:
[0]: index [logstash-2012.06.06], type [linux-syslog], id
[s1aaRxwSTemVxq4aQT0yxw], message
[UnavailableShardsException[[logstash-2012.06.06][3] [2] shardIt, [0]
active : Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@41e32c8e]]
[3]: index [logstash-2012.06.06], type [linux-syslog], id
[m0u7l0fSQbSKNXWh_JrV_w], message
[UnavailableShardsException[[logstash-2012.06.06][2] [2] shardIt, [0]
active : Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2982406d]]
[8]: index [logstash-2012.06.06], type [linux-syslog], id
[m2meqViiSkeY3sl-1QgCxw], message
[UnavailableShardsException[[logstash-2012.06.06][3] [2] shardIt, [0]
active : Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@41e32c8e]]
[2012-06-06 20:10:01,806][DEBUG][action.admin.indices.status]
[sl02.mail01] [logstash-2012.05.29][4], node[aTLu9iSzQSG3rgKnp5wWRw],
[R], s[STARTED]: Failed to execute
[org.elasticsearch.action.admin.indices.status.IndicesStatusRequest@2f8f4c9f]
org.elasticsearch.transport.RemoteTransportException: [sl03.mail01]
[inet[/192.168.1.113:9300]][indices/status/s]
Caused by: org.elasticsearch.indices.IndexMissingException:
[logstash-2012.05.29] missing
at
org.elasticsearch.indices.InternalIndicesService.indexServiceSafe(InternalIndicesService.java:
243)
at
org.elasticsearch.action.admin.indices.status.TransportIndicesStatusAction.shardOperation(TransportIndicesStatusAction.java:
152)
at
org.elasticsearch.action.admin.indices.status.TransportIndicesStatusAction.shardOperation(TransportIndicesStatusAction.java:
59)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction
$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:
398)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction
$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:
384)
at org.elasticsearch.transport.netty.MessageChannelHandler
$RequestHandler.run(MessageChannelHandler.java:374)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

The second node in the cluster has the following logs:

[2012-06-06 20:00:00,111][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.06.07] creating index, cause [auto(bulk
api)], shards [5]/[1], mappings []
[2012-06-06 20:00:00,211][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-06 20:00:00,298][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-06 20:00:00,688][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-06 20:00:00,691][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-06 20:00:00,905][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.06.07] update_mapping [linux-syslog]
(dynamic)
[2012-06-06 20:00:02,016][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-06 20:00:02,207][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.05.31] deleting index
[2012-06-07 00:06:33,253][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][539986][123575] duration [1.5s],
collections [1]/[1.6s], total [1.5s]/[1.2h], memory [11.9gb]->[11.7gb]/
[15.9gb]
[2012-06-07 07:01:47,562][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][564889][127077] duration [3.6s],
collections [1]/[4.1s], total [3.6s]/[1.2h], memory [11.3gb]->[11.1gb]/
[15.9gb]
[2012-06-07 15:26:59,238][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][595186][133891] duration [1.2s],
collections [1]/[1.4s], total [1.2s]/[1.3h], memory [10.3gb]->[10.1gb]/
[15.9gb]
[2012-06-07 20:00:00,154][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.06.08] creating index, cause [auto(bulk
api)], shards [5]/[1], mappings []
[2012-06-07 20:00:00,275][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-07 20:00:00,419][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-07 20:00:00,546][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-07 20:00:00,550][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-07 20:00:00,961][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.06.08] update_mapping [linux-syslog]
(dynamic)
[2012-06-07 20:00:01,580][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.06.01] deleting index
[2012-06-07 20:10:01,384][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-08 08:52:42,269][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][657898][144133] duration [2.8s],
collections [1]/[2.8s], total [2.8s]/[1.4h], memory [10.6gb]->[10.5gb]/
[15.9gb]
[2012-06-08 09:01:21,987][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][658412][144249] duration [5s], collections
[1]/[5.9s], total [5s]/[1.4h], memory [10.8gb]->[10.7gb]/[15.9gb]
[2012-06-08 09:01:36,775][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][658426][144250] duration [1s], collections
[1]/[1.7s], total [1s]/[1.4h], memory [10.8gb]->[10.8gb]/[15.9gb]
[2012-06-08 09:16:13,234][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][659284][144484] duration [18.7s],
collections [1]/[19.1s], total [18.7s]/[1.4h], memory [11.2gb]-

[11.1gb]/[15.9gb]
[2012-06-08 09:26:07,903][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][659877][144662] duration [1s], collections
[1]/[1.9s], total [1s]/[1.4h], memory [11.6gb]->[11.5gb]/[15.9gb]
[2012-06-08 09:35:05,191][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][660412][144806] duration [1.7s],
collections [1]/[2.7s], total [1.7s]/[1.4h], memory [11.8gb]->[11.7gb]/
[15.9gb]


(Shay Banon) #2

Which version are you using?

If you delete an index using an API, then it does not really matter which
node you hit, it will be applied in a distributed manner. Also, it seems
like GC logs show slower GCs then expected, can you check if you are not
running out of memory on the nodes?

A sync might be required on full cluster restart, its throttled and all,
there are plans to improve the cases where a full sync is required.

On Tue, Jun 12, 2012 at 1:46 AM, arimus arimus@gmail.com wrote:

I've have a 2 node cluster (configured for 1 replica) running for a
couple weeks. Looks like it started to get unhappy, throwing lots of
the messages below in the logs and eventually throwing lots of the
subsequent exceptions. This all ended in the cluster getting into an
unhappy state (red on node1, yellow on node2) . node2 was listed in
the cluster nodes when querying node1, but there were 2 indexes that
showed they had some recovering shards with no apparent progress and 4
more shards that were unassigned.

I was able to track down that one of our crons actually pruned an old
index from the second node in the cluster as part of clean-up
(logstash-2012.05.29). I'm guessing that ES didn't like the fact that
the index was deleted from that second node and subsequently decided
to ignore that nodes participation in the cluster? I ended up
deleting that same index from node1 and then restarting node2, and the
cluster started recovering with node2 basically getting completely
wiped and sync'ng everything from node1 from scratch. Did node1
decide that node2 was in a bad state due to the index deletion issue
and decide to wipe it completely and re-copy all the indexes/shards to
it? That's a rather expensive behavior, so my question is can it be
avoided?

Thoughts? Log files entries below.

==

The first node in the cluster had these logs (lots of the
UnavailableShardsException, followed by the IndexMissingException).

[2012-06-06 20:01:00,041][WARN ][river.rabbitmq ]
[sl02.mail01] [rabbitmq][logstash-n05_mail01] failed to executefailure
in bulk execution:
[0]: index [logstash-2012.06.06], type [linux-syslog], id
[s1aaRxwSTemVxq4aQT0yxw], message
[UnavailableShardsException[[logstash-2012.06.06][3] [2] shardIt, [0]
active : Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@41e32c8e]]
[3]: index [logstash-2012.06.06], type [linux-syslog], id
[m0u7l0fSQbSKNXWh_JrV_w], message
[UnavailableShardsException[[logstash-2012.06.06][2] [2] shardIt, [0]
active : Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2982406d]]
[8]: index [logstash-2012.06.06], type [linux-syslog], id
[m2meqViiSkeY3sl-1QgCxw], message
[UnavailableShardsException[[logstash-2012.06.06][3] [2] shardIt, [0]
active : Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@41e32c8e]]
[2012-06-06 20:10:01,806][DEBUG][action.admin.indices.status]
[sl02.mail01] [logstash-2012.05.29][4], node[aTLu9iSzQSG3rgKnp5wWRw],
[R], s[STARTED]: Failed to execute

[org.elasticsearch.action.admin.indices.status.IndicesStatusRequest@2f8f4c9f
]
org.elasticsearch.transport.RemoteTransportException: [sl03.mail01]
[inet[/192.168.1.113:9300]][indices/status/s]
Caused by: org.elasticsearch.indices.IndexMissingException:
[logstash-2012.05.29] missing
at

org.elasticsearch.indices.InternalIndicesService.indexServiceSafe(InternalIndicesService.java:
243)
at

org.elasticsearch.action.admin.indices.status.TransportIndicesStatusAction.shardOperation(TransportIndicesStatusAction.java:
152)
at

org.elasticsearch.action.admin.indices.status.TransportIndicesStatusAction.shardOperation(TransportIndicesStatusAction.java:
59)
at

org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction

$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:
398)
at

org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction

$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:
384)
at org.elasticsearch.transport.netty.MessageChannelHandler
$RequestHandler.run(MessageChannelHandler.java:374)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

The second node in the cluster has the following logs:

[2012-06-06 20:00:00,111][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.06.07] creating index, cause [auto(bulk
api)], shards [5]/[1], mappings []
[2012-06-06 20:00:00,211][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-06 20:00:00,298][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-06 20:00:00,688][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-06 20:00:00,691][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-06 20:00:00,905][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.06.07] update_mapping [linux-syslog]
(dynamic)
[2012-06-06 20:00:02,016][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-06 20:00:02,207][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.05.31] deleting index
[2012-06-07 00:06:33,253][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][539986][123575] duration [1.5s],
collections [1]/[1.6s], total [1.5s]/[1.2h], memory [11.9gb]->[11.7gb]/
[15.9gb]
[2012-06-07 07:01:47,562][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][564889][127077] duration [3.6s],
collections [1]/[4.1s], total [3.6s]/[1.2h], memory [11.3gb]->[11.1gb]/
[15.9gb]
[2012-06-07 15:26:59,238][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][595186][133891] duration [1.2s],
collections [1]/[1.4s], total [1.2s]/[1.3h], memory [10.3gb]->[10.1gb]/
[15.9gb]
[2012-06-07 20:00:00,154][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.06.08] creating index, cause [auto(bulk
api)], shards [5]/[1], mappings []
[2012-06-07 20:00:00,275][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-07 20:00:00,419][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-07 20:00:00,546][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-07 20:00:00,550][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-07 20:00:00,961][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.06.08] update_mapping [linux-syslog]
(dynamic)
[2012-06-07 20:00:01,580][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.06.01] deleting index
[2012-06-07 20:10:01,384][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-08 08:52:42,269][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][657898][144133] duration [2.8s],
collections [1]/[2.8s], total [2.8s]/[1.4h], memory [10.6gb]->[10.5gb]/
[15.9gb]
[2012-06-08 09:01:21,987][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][658412][144249] duration [5s], collections
[1]/[5.9s], total [5s]/[1.4h], memory [10.8gb]->[10.7gb]/[15.9gb]
[2012-06-08 09:01:36,775][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][658426][144250] duration [1s], collections
[1]/[1.7s], total [1s]/[1.4h], memory [10.8gb]->[10.8gb]/[15.9gb]
[2012-06-08 09:16:13,234][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][659284][144484] duration [18.7s],
collections [1]/[19.1s], total [18.7s]/[1.4h], memory [11.2gb]-

[11.1gb]/[15.9gb]
[2012-06-08 09:26:07,903][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][659877][144662] duration [1s], collections
[1]/[1.9s], total [1s]/[1.4h], memory [11.6gb]->[11.5gb]/[15.9gb]
[2012-06-08 09:35:05,191][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][660412][144806] duration [1.7s],
collections [1]/[2.7s], total [1.7s]/[1.4h], memory [11.8gb]->[11.7gb]/
[15.9gb]


(arimus) #3

We're using version 0.19.3.

It does appear that there have been some memory-related issues since this
incident, but this one didn't appear to be at the time. We did have an OOM
since this incident and for sure that caused the node with the problem to
get marked as needing a full sync from the secondary. It completed just
fine, but it would definitely be nice to see just a sync of the specific
shards that had issues.

On Wednesday, June 13, 2012 11:42:49 AM UTC-7, kimchy wrote:

Which version are you using?

If you delete an index using an API, then it does not really matter which
node you hit, it will be applied in a distributed manner. Also, it seems
like GC logs show slower GCs then expected, can you check if you are not
running out of memory on the nodes?

A sync might be required on full cluster restart, its throttled and all,
there are plans to improve the cases where a full sync is required.

On Tue, Jun 12, 2012 at 1:46 AM, arimus arimus@gmail.com wrote:

I've have a 2 node cluster (configured for 1 replica) running for a
couple weeks. Looks like it started to get unhappy, throwing lots of
the messages below in the logs and eventually throwing lots of the
subsequent exceptions. This all ended in the cluster getting into an
unhappy state (red on node1, yellow on node2) . node2 was listed in
the cluster nodes when querying node1, but there were 2 indexes that
showed they had some recovering shards with no apparent progress and 4
more shards that were unassigned.

I was able to track down that one of our crons actually pruned an old
index from the second node in the cluster as part of clean-up
(logstash-2012.05.29). I'm guessing that ES didn't like the fact that
the index was deleted from that second node and subsequently decided
to ignore that nodes participation in the cluster? I ended up
deleting that same index from node1 and then restarting node2, and the
cluster started recovering with node2 basically getting completely
wiped and sync'ng everything from node1 from scratch. Did node1
decide that node2 was in a bad state due to the index deletion issue
and decide to wipe it completely and re-copy all the indexes/shards to
it? That's a rather expensive behavior, so my question is can it be
avoided?

Thoughts? Log files entries below.

==

The first node in the cluster had these logs (lots of the
UnavailableShardsException, followed by the IndexMissingException).

[2012-06-06 20:01:00,041][WARN ][river.rabbitmq ]
[sl02.mail01] [rabbitmq][logstash-n05_mail01] failed to executefailure
in bulk execution:
[0]: index [logstash-2012.06.06], type [linux-syslog], id
[s1aaRxwSTemVxq4aQT0yxw], message
[UnavailableShardsException[[logstash-2012.06.06][3] [2] shardIt, [0]
active : Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@41e32c8e]]
[3]: index [logstash-2012.06.06], type [linux-syslog], id
[m0u7l0fSQbSKNXWh_JrV_w], message
[UnavailableShardsException[[logstash-2012.06.06][2] [2] shardIt, [0]
active : Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2982406d]]
[8]: index [logstash-2012.06.06], type [linux-syslog], id
[m2meqViiSkeY3sl-1QgCxw], message
[UnavailableShardsException[[logstash-2012.06.06][3] [2] shardIt, [0]
active : Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@41e32c8e]]
[2012-06-06 20:10:01,806][DEBUG][action.admin.indices.status]
[sl02.mail01] [logstash-2012.05.29][4], node[aTLu9iSzQSG3rgKnp5wWRw],
[R], s[STARTED]: Failed to execute

[org.elasticsearch.action.admin.indices.status.IndicesStatusRequest@2f8f4c9f]
org.elasticsearch.transport.RemoteTransportException: [sl03.mail01]
[inet[/192.168.1.113:9300]][indices/status/s]
Caused by: org.elasticsearch.indices.IndexMissingException:
[logstash-2012.05.29] missing
at

org.elasticsearch.indices.InternalIndicesService.indexServiceSafe(InternalIndicesService.java:
243)
at

org.elasticsearch.action.admin.indices.status.TransportIndicesStatusAction.shardOperation(TransportIndicesStatusAction.java:
152)
at

org.elasticsearch.action.admin.indices.status.TransportIndicesStatusAction.shardOperation(TransportIndicesStatusAction.java:
59)
at

org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction

$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:
398)
at

org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction

$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:
384)
at org.elasticsearch.transport.netty.MessageChannelHandler
$RequestHandler.run(MessageChannelHandler.java:374)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

The second node in the cluster has the following logs:

[2012-06-06 20:00:00,111][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.06.07] creating index, cause [auto(bulk
api)], shards [5]/[1], mappings []
[2012-06-06 20:00:00,211][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-06 20:00:00,298][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-06 20:00:00,688][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-06 20:00:00,691][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-06 20:00:00,905][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.06.07] update_mapping [linux-syslog]
(dynamic)
[2012-06-06 20:00:02,016][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-06 20:00:02,207][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.05.31] deleting index
[2012-06-07 00:06:33,253][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][539986][123575] duration [1.5s],
collections [1]/[1.6s], total [1.5s]/[1.2h], memory [11.9gb]->[11.7gb]/
[15.9gb]
[2012-06-07 07:01:47,562][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][564889][127077] duration [3.6s],
collections [1]/[4.1s], total [3.6s]/[1.2h], memory [11.3gb]->[11.1gb]/
[15.9gb]
[2012-06-07 15:26:59,238][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][595186][133891] duration [1.2s],
collections [1]/[1.4s], total [1.2s]/[1.3h], memory [10.3gb]->[10.1gb]/
[15.9gb]
[2012-06-07 20:00:00,154][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.06.08] creating index, cause [auto(bulk
api)], shards [5]/[1], mappings []
[2012-06-07 20:00:00,275][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-07 20:00:00,419][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-07 20:00:00,546][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-07 20:00:00,550][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-07 20:00:00,961][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.06.08] update_mapping [linux-syslog]
(dynamic)
[2012-06-07 20:00:01,580][INFO ][cluster.metadata ]
[sl03.mail01] [logstash-2012.06.01] deleting index
[2012-06-07 20:10:01,384][WARN ][discovery.zen ]
[sl03.mail01] master should not receive new cluster state from
[[sl02.mail01][6ohKxf-IQj6lYxB8EaKxKg][inet[/192.168.1.112:9300]]]
[2012-06-08 08:52:42,269][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][657898][144133] duration [2.8s],
collections [1]/[2.8s], total [2.8s]/[1.4h], memory [10.6gb]->[10.5gb]/
[15.9gb]
[2012-06-08 09:01:21,987][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][658412][144249] duration [5s], collections
[1]/[5.9s], total [5s]/[1.4h], memory [10.8gb]->[10.7gb]/[15.9gb]
[2012-06-08 09:01:36,775][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][658426][144250] duration [1s], collections
[1]/[1.7s], total [1s]/[1.4h], memory [10.8gb]->[10.8gb]/[15.9gb]
[2012-06-08 09:16:13,234][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][659284][144484] duration [18.7s],
collections [1]/[19.1s], total [18.7s]/[1.4h], memory [11.2gb]-

[11.1gb]/[15.9gb]
[2012-06-08 09:26:07,903][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][659877][144662] duration [1s], collections
[1]/[1.9s], total [1s]/[1.4h], memory [11.6gb]->[11.5gb]/[15.9gb]
[2012-06-08 09:35:05,191][WARN ][monitor.jvm ]
[sl03.mail01] [gc][ParNew][660412][144806] duration [1.7s],
collections [1]/[2.7s], total [1.7s]/[1.4h], memory [11.8gb]->[11.7gb]/
[15.9gb]


(system) #4