Disk got full and the shards never seemed to be recovered?

So, WHILE we were indexing, which we do every 30 min, the disk got full. The bulkprocessor (java) got hanging and never quit the indexing. Or thew an exception.

The disk got fixed, but the index process still seemed to be locked.
Elasticsearch logged NoShardAvailableActionException.
The bulkprocessor remained locked and the indices were stuck at red.

What I did?
I deleted the indices and restarted the service that indexes elasticsearch.

Is there any better way to handle this?

$curl localhost:12400/_cluster/health | python -mjson.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
129 389 129 389 0 0 70650 0 --:--:-- --:--:-- --:--:-- 379k
{
"active_primary_shards": 0,
"active_shards": 0,
"active_shards_percent_as_number": 0.0,
"cluster_name": "oppdragsregister",
"delayed_unassigned_shards": 0,
"initializing_shards": 0,
"number_of_data_nodes": 2,
"number_of_in_flight_fetch": 0,
"number_of_nodes": 4,
"number_of_pending_tasks": 0,
"relocating_shards": 0,
"status": "red",
"task_max_waiting_in_queue_millis": 0,
"timed_out": false,
"unassigned_shards": 20
}

$ curl localhost:12400/_cluster/allocation/explain | python -mjson.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
106 746 106 746 0 0 10917 0 --:--:-- --:--:-- --:--:-- 11134
{
"allocate_explanation": "can allocate the shard",
"can_allocate": "yes",
"current_state": "unassigned",
"index": "blabla:2017-06-21t07:07:58.601",
"node_allocation_decisions": [
{
"node_decision": "yes",
"node_id": "NbWEPxmBQn-GVvQcIlZMDg",
"node_name": "blabla-data",
"transport_address": "10...:12500",
"weight_ranking": 1
},
{
"node_decision": "yes",
"node_id": "zr6vKM7-RJmTYTiqjtQ7vw",
"node_name": "blabla-data",
"transport_address": "10...:12500",
"weight_ranking": 2
}
],
"primary": true,
"shard": 1,
"target_node": {
"id": "NbWEPxmBQn-GVvQcIlZMDg",
"name": "blabla-data",
"transport_address": "...:12500"
},
"unassigned_info": {
"at": "2017-06-21T05:07:58.612Z",
"last_allocation_status": "no",
"reason": "INDEX_CREATED"
}
}

[2017-06-21T09:52:27,820][DEBUG][o.e.a.s.TransportSearchAction] [nodedata] All shards failed for phase: [query]
org.elasticsearch.action.NoShardAvailableActionException: null
at org.elasticsearch.action.search.AbstractSearchAsyncAction.start(AbstractSearchAsyncAction.java:115) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:154) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:53) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:173) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:145) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:64) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:54) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1488) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:109) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1445) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1329) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74) [transport-netty4-5.2.0.jar:5.2.0]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:396) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)

Hello:

I guess so, but, could you confirm the same message if you make a reroute API call, with rety_failed?

https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-reroute.html

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.