Hi,
I am facing a problem that I could use some help on. The documentation does
not provide any insight and the threads here dont really touch this problem.
I have a very large index. Right now when I have a node with data set to
false and running river-rabbitmq bulking messages. I have 3 data servers
behind it to process and index. Everything works great up until I get right
around 3-4 million docs. at about 12 GIG. River will start displaying
failure to ACK messages with rabbit MQ. The only way to make it work again
is restart the entire cluster. Once the cluster comes back online it starts
spitting these messages:
, master=true, river=freqbeta} marked shard as started, but shard have not
been created, mark shard as failed]
[2012-08-15 20:40:44,401][WARN ][cluster.action.shard ] [riverland]
received shard failed for [item][4], node[Ag3EX-aDR-eEuLOAuvyicw], [P],
s[STARTED], reason [master
[riverland][gxUoQ0ByQDaKm_JHSwms-w][inet[/10.218.29.213:9300]]{data=false,
master=true, river=freqbeta} marked shard as started, but shard have not
been created, mark shard as failed]
[2012-08-15 20:40:42,060][WARN ][cluster.action.shard ] [riverland]
received shard failed for [item][3], node[cD2LK8_1SgOwBVmztcqJ0g], [P],
s[STARTED], reason [master
[riverland][gxUoQ0ByQDaKm_JHSwms-w][inet[/10.218.29.213:9300]]{data=false,
master=true, river=freqbeta} marked shard as started, but shard have not
been created, mark shard as failed]
[2012-08-15 20:40:44,402][WARN ][cluster.action.shard ] [riverland]
received shard failed for [_river][0], node[cWAYW_XtSsG1CYcXYry51w], [P],
s[STARTED], reason [master
[riverland][gxUoQ0ByQDaKm_JHSwms-w][inet[/10.218.29.213:9300]]{data=false,
master=true, river=freqbeta} marked shard as started, but shard have not
been created, mark shard as failed]
[2012-08-15 20:40:44,766][WARN ][cluster.action.shard ] [riverland]
received shard failed for [_river][0], node[cWAYW_XtSsG1CYcXYry51w], [P],
s[STARTED], reason [master
[riverland][gxUoQ0ByQDaKm_JHSwms-w][inet[/10.218.29.213:9300]]{data=false,
master=true, river=freqbeta} marked shard as started, but shard have not
been created, mark shard as failed]
[2012-08-15 20:40:48,766][DEBUG][action.admin.indices.status] [riverland]
[_river][0], node[cWAYW_XtSsG1CYcXYry51w], [P], s[STARTED]: Failed to
execute
[org.elasticsearch.action.admin.indices.status.IndicesStatusRequest@1e8843f5]
org.elasticsearch.transport.RemoteTransportException:
[bashful][inet[/10.227.6.136:9300]][indices/status/s]
Caused by: org.elasticsearch.indices.IndexMissingException: [_river] missing
at
org.elasticsearch.indices.InternalIndicesService.indexServiceSafe(InternalIndicesService.java:244)
at
org.elasticsearch.action.admin.indices.status.TransportIndicesStatusAction.shardOperation(TransportIndicesStatusAction.java:152)
at
org.elasticsearch.action.admin.indices.status.TransportIndicesStatusAction.shardOperation(TransportIndicesStatusAction.java:59)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:398)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:384)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:400)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
I can not recover from this. the only option is to last away the index.
What can I do? This is a production machine and its causing issues.
Thanks
Jon
--