River does not start correctly after full cluster shutdown

Michel_Conrad · May 4, 2011, 2:41pm

Hi,

When I shutdown the entire cluster, the river fails to start on the
next startup.
I am using the head version of the 0.16 branch and a local gateway.

The following warning is logged:
[2011-05-04 16:19:39,401][WARN ][river ]
[Moonstone] failed to get _meta from [testcluster]/[testriver]
org.elasticsearch.action.NoShardAvailableActionException: [_river][0]
No shard available for [testriver#_meta]
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.perform(TransportShardSingleOperationAction.java:193)
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.performFirst(TransportShardSingleOperationAction.java:159)
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.start(TransportShardSingleOperationAction.java:116)
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:69)
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:45)
at org.elasticsearch.action.support.BaseAction.execute(BaseAction.java:61)
at org.elasticsearch.client.node.NodeClient.get(NodeClient.java:152)
at org.elasticsearch.client.action.get.GetRequestBuilder.doExecute(GetRequestBuilder.java:119)
at org.elasticsearch.client.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:56)
at org.elasticsearch.river.RiversService$ApplyRivers.riverClusterChanged(RiversService.java:217)
at org.elasticsearch.river.cluster.RiverClusterService$1.run(RiverClusterService.java:126)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

It seems that the riverservice tries to get the _meta before the state
has been restored and never retries afterwards.
If I manually kill the node issuing the warning, the river gets
correctly started on another node.

Best,
Michel

kimchy · May 4, 2011, 5:02pm

Opened an issue here: Possible failure to start a river after cluster restart · Issue #902 · elastic/elasticsearch · GitHub. I can't recreate it here, can you recreate it on your end? I will push a fix, but without being able to recreate it, you will need to verify it....
On Wednesday, May 4, 2011 at 5:41 PM, Michel Conrad wrote:

Hi,

When I shutdown the entire cluster, the river fails to start on the
next startup.
I am using the head version of the 0.16 branch and a local gateway.

The following warning is logged:
[2011-05-04 16:19:39,401][WARN ][river ]
[Moonstone] failed to get _meta from [testcluster]/[testriver]
org.elasticsearch.action.NoShardAvailableActionException: [_river][0]
No shard available for [testriver#_meta]
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.perform(TransportShardSingleOperationAction.java:193)
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.performFirst(TransportShardSingleOperationAction.java:159)
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.start(TransportShardSingleOperationAction.java:116)
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:69)
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:45)
at org.elasticsearch.action.support.BaseAction.execute(BaseAction.java:61)
at org.elasticsearch.client.node.NodeClient.get(NodeClient.java:152)
at org.elasticsearch.client.action.get.GetRequestBuilder.doExecute(GetRequestBuilder.java:119)
at org.elasticsearch.client.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:56)
at org.elasticsearch.river.RiversService$ApplyRivers.riverClusterChanged(RiversService.java:217)
at org.elasticsearch.river.cluster.RiverClusterService$1.run(RiverClusterService.java:126)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

It seems that the riverservice tries to get the _meta before the state
has been restored and never retries afterwards.
If I manually kill the node issuing the warning, the river gets
correctly started on another node.

Best,
Michel

Michel_Conrad · May 4, 2011, 5:14pm

I will try it out as soon as you push the fix.
What might be of help, I am currently using 5 nodes with 2 replicas.
It does not happen every time, but nearly every time.
Usually it happens if the river is being started on one of the first
nodes (in the order in which the nodes are started).

On Wed, May 4, 2011 at 7:02 PM, Shay Banon shay.banon@elasticsearch.com wrote:

Opened an issue
here: Possible failure to start a river after cluster restart · Issue #902 · elastic/elasticsearch · GitHub. I can't
recreate it here, can you recreate it on your end? I will push a fix, but
without being able to recreate it, you will need to verify it....

On Wednesday, May 4, 2011 at 5:41 PM, Michel Conrad wrote:

Hi,

When I shutdown the entire cluster, the river fails to start on the
next startup.
I am using the head version of the 0.16 branch and a local gateway.

The following warning is logged:
[2011-05-04 16:19:39,401][WARN ][river ]
[Moonstone] failed to get _meta from [testcluster]/[testriver]
org.elasticsearch.action.NoShardAvailableActionException: [_river][0]
No shard available for [testriver#_meta]
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.perform(TransportShardSingleOperationAction.java:193)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.performFirst(TransportShardSingleOperationAction.java:159)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.start(TransportShardSingleOperationAction.java:116)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:69)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:45)
at org.elasticsearch.action.support.BaseAction.execute(BaseAction.java:61)
at org.elasticsearch.client.node.NodeClient.get(NodeClient.java:152)
at
org.elasticsearch.client.action.get.GetRequestBuilder.doExecute(GetRequestBuilder.java:119)
at
org.elasticsearch.client.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:56)
at
org.elasticsearch.river.RiversService$ApplyRivers.riverClusterChanged(RiversService.java:217)
at
org.elasticsearch.river.cluster.RiverClusterService$1.run(RiverClusterService.java:126)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

It seems that the riverservice tries to get the _meta before the state
has been restored and never retries afterwards.
If I manually kill the node issuing the warning, the river gets
correctly started on another node.

Best,
Michel

Michel_Conrad · June 16, 2011, 8:25am

Hi,

I got some problems with rivers not starting correctly after a full
cluster shutdown. This time is is not a
NoShardAvailableActionException, but a IndexMissingException, where
the river will not be started. The river is only started correctly on
another node if I kill the node getting the exception. I think the
retry logic in RiversService.java might also retry the get when
an IndexMissingException occurs.

Best,
Michel

[2011-06-15 18:19:41,298][ESC[33mWARN
ESC[0m][discovery.zen.ping.multicast] [Thing] received ping response
with no matching id [1]
[2011-06-15 18:19:41,298][ESC[33mWARN
ESC[0m][discovery.zen.ping.multicast] [Thing] received ping response
with no matching id [1]
[2011-06-15 18:19:44,440][ESC[33mWARN ESC[0m][river
] [Thing] failed to get _meta from [trendiction]/[dsearch_en]
org.elasticsearch.indices.IndexMissingException: [_river] missing
at org.elasticsearch.cluster.metadata.MetaData.concreteIndex(MetaData.java:211)
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.(TransportShardSingleOperationAction.java:107)
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.(TransportShardSingleOperationAction.java:88)
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:69)
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:45)
at org.elasticsearch.action.support.BaseAction.execute(BaseAction.java:61)
at org.elasticsearch.client.node.NodeClient.get(NodeClient.java:152)
at org.elasticsearch.client.action.get.GetRequestBuilder.doExecute(GetRequestBuilder.java:119)
at org.elasticsearch.client.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:56)
at org.elasticsearch.river.RiversService$ApplyRivers.riverClusterChanged(RiversService.java:220)
at org.elasticsearch.river.cluster.RiverClusterService$1.run(RiverClusterService.java:126)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2011-06-15 18:19:45,173][ESC[33mWARN ESC[0m][discovery
] [Thing] waited for 30s and no initial state was set by the discovery

On Wed, May 4, 2011 at 7:14 PM, Michel Conrad
michel.conrad@trendiction.com wrote:

I will try it out as soon as you push the fix.
What might be of help, I am currently using 5 nodes with 2 replicas.
It does not happen every time, but nearly every time.
Usually it happens if the river is being started on one of the first
nodes (in the order in which the nodes are started).

On Wed, May 4, 2011 at 7:02 PM, Shay Banon shay.banon@elasticsearch.com wrote:

Opened an issue
here: Possible failure to start a river after cluster restart · Issue #902 · elastic/elasticsearch · GitHub. I can't
recreate it here, can you recreate it on your end? I will push a fix, but
without being able to recreate it, you will need to verify it....

On Wednesday, May 4, 2011 at 5:41 PM, Michel Conrad wrote:

Hi,

When I shutdown the entire cluster, the river fails to start on the
next startup.
I am using the head version of the 0.16 branch and a local gateway.

The following warning is logged:
[2011-05-04 16:19:39,401][WARN ][river ]
[Moonstone] failed to get _meta from [testcluster]/[testriver]
org.elasticsearch.action.NoShardAvailableActionException: [_river][0]
No shard available for [testriver#_meta]
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.perform(TransportShardSingleOperationAction.java:193)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.performFirst(TransportShardSingleOperationAction.java:159)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.start(TransportShardSingleOperationAction.java:116)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:69)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:45)
at org.elasticsearch.action.support.BaseAction.execute(BaseAction.java:61)
at org.elasticsearch.client.node.NodeClient.get(NodeClient.java:152)
at
org.elasticsearch.client.action.get.GetRequestBuilder.doExecute(GetRequestBuilder.java:119)
at
org.elasticsearch.client.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:56)
at
org.elasticsearch.river.RiversService$ApplyRivers.riverClusterChanged(RiversService.java:217)
at
org.elasticsearch.river.cluster.RiverClusterService$1.run(RiverClusterService.java:126)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

It seems that the riverservice tries to get the _meta before the state
has been restored and never retries afterwards.
If I manually kill the node issuing the warning, the river gets
correctly started on another node.

Best,
Michel

kimchy · June 16, 2011, 11:06am

Yea, index missing should also be handled, opened an issue: River might not start properly after cluster shutdown (timing) · Issue #1038 · elastic/elasticsearch · GitHub.

On Thursday, June 16, 2011 at 11:25 AM, Michel Conrad wrote:

Hi,

I got some problems with rivers not starting correctly after a full
cluster shutdown. This time is is not a
NoShardAvailableActionException, but a IndexMissingException, where
the river will not be started. The river is only started correctly on
another node if I kill the node getting the exception. I think the
retry logic in RiversService.java might also retry the get when
an IndexMissingException occurs.

Best,
Michel

[2011-06-15 18:19:41,298][ESC[33mWARN
ESC[0m][discovery.zen.ping.multicast] [Thing] received ping response
with no matching id [1]
[2011-06-15 18:19:41,298][ESC[33mWARN
ESC[0m][discovery.zen.ping.multicast] [Thing] received ping response
with no matching id [1]
[2011-06-15 18:19:44,440][ESC[33mWARN ESC[0m][river
] [Thing] failed to get _meta from [trendiction]/[dsearch_en]
org.elasticsearch.indices.IndexMissingException: [_river] missing
at org.elasticsearch.cluster.metadata.MetaData.concreteIndex(MetaData.java:211)
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.(TransportShardSingleOperationAction.java:107)
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.(TransportShardSingleOperationAction.java:88)
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:69)
at org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:45)
at org.elasticsearch.action.support.BaseAction.execute(BaseAction.java:61)
at org.elasticsearch.client.node.NodeClient.get(NodeClient.java:152)
at org.elasticsearch.client.action.get.GetRequestBuilder.doExecute(GGetRequestBuilder.java:119)
at org.elasticsearch.client.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:56)
at org.elasticsearch.river.RiversService$$ApplyRivers.riverClusterChanged(RiversService.java:220)
at org.elasticsearch.river.cluster.RiverClusterService$1.run(RiverClusterService.java:126)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2011-06-15 18:19:45,173][ESC[33mWARN ESC[0m][discovery
] [Thing] waited for 30s and no initial state was set by the discovery

On Wed, May 4, 2011 at 7:14 PM, Michel Conrad
<michel.conrad@trendiction.com (mailto:michel.conrad@trendiction.com)> wrote:

I will try it out as soon as you push the fix.
What might be of help, I am currently using 5 nodes with 2 replicas.
It does not happen every time, but nearly every time.
Usually it happens if the river is being started on one of the first
nodes (in the order in which the nodes are started).

On Wed, May 4, 2011 at 7:02 PM, Shay Banon <shay.banon@elasticsearch.com (mailto:shay.banon@elasticsearch.com)> wrote:

Opened an issue
here: Possible failure to start a river after cluster restart · Issue #902 · elastic/elasticsearch · GitHub. I can't
recreate it here, can you recreate it on your end? I will push a fix, but
without being able to recreate it, you will need to verify it....

On Wednesday, May 4, 2011 at 5:41 PM, Michel Conrad wrote:

Hi,

When I shutdown the entire cluster, the river fails to start on the
next startup.
I am using the head version of the 0.16 branch and a local gateway.

The following warning is logged:
[2011-05-04 16:19:39,401][WARN ][river ]
[Moonstone] failed to get _meta from [testcluster]/[testriver]
org.elasticsearch.action.NoShardAvailableActionException: [_river][0]
No shard available for [testriver#_meta]
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.perform(TransportShardSingleOperationAction.java:193)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.performFirst(TransportShardSingleOperationAction.java:159)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction$AsyncSingleAction.start(TransportShardSingleOperationAction.java:116)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:69)
at
org.elasticsearch.action.support.single.shard.TransportShardSingleOperationAction.doExecute(TransportShardSingleOperationAction.java:45)
at org.elasticsearch.action.support.BaseAction.execute(BaseAction.java:61)
at org.elasticsearch.client.node.NodeClient.get(NodeClient.java:152)
at
org.elasticsearch.client.action.get.GetRequestBuilder.doExecute(GetRequestBuilder.java:119)
at
org.elasticsearch.client.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:56)
at
org.elasticsearch.river.RiversService$ApplyRivers.riverClusterChanged(RiversService.java:217)
at
org.elasticsearch.river.cluster.RiverClusterService$1.run(RiverClusterService.java:126)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

It seems that the riverservice tries to get the _meta before the state
has been restored and never retries afterwards.
If I manually kill the node issuing the warning, the river gets
correctly started on another node.

Best,
Michel

Topic		Replies	Views
River Failover Elasticsearch	15	465	July 6, 2017
Newly created rivers are not being started Elasticsearch	5	387	July 6, 2017
Unstable cluster Elasticsearch	11	2089	July 6, 2017
IllegalIndexShardStateException: [_river][0] CurrentState[RECOVERING] Read operations only allowed when started/relocated Elasticsearch	2	1875	July 6, 2017
River not being migrated to new node on service restart (ES 0.15.0) Elasticsearch	4	383	July 6, 2017

River does not start correctly after full cluster shutdown

Related topics