Disk got full and the shards never seemed to be recovered?

blommis · June 21, 2017, 8:09am

So, WHILE we were indexing, which we do every 30 min, the disk got full. The bulkprocessor (java) got hanging and never quit the indexing. Or thew an exception.

The disk got fixed, but the index process still seemed to be locked.
Elasticsearch logged NoShardAvailableActionException.
The bulkprocessor remained locked and the indices were stuck at red.

What I did?
I deleted the indices and restarted the service that indexes elasticsearch.

Is there any better way to handle this?

$curl localhost:12400/_cluster/health | python -mjson.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
129 389 129 389 0 0 70650 0 --:--:-- --:--:-- --:--:-- 379k
{
"active_primary_shards": 0,
"active_shards": 0,
"active_shards_percent_as_number": 0.0,
"cluster_name": "oppdragsregister",
"delayed_unassigned_shards": 0,
"initializing_shards": 0,
"number_of_data_nodes": 2,
"number_of_in_flight_fetch": 0,
"number_of_nodes": 4,
"number_of_pending_tasks": 0,
"relocating_shards": 0,
"status": "red",
"task_max_waiting_in_queue_millis": 0,
"timed_out": false,
"unassigned_shards": 20
}

$ curl localhost:12400/_cluster/allocation/explain | python -mjson.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
106 746 106 746 0 0 10917 0 --:--:-- --:--:-- --:--:-- 11134
{
"allocate_explanation": "can allocate the shard",
"can_allocate": "yes",
"current_state": "unassigned",
"index": "blabla:2017-06-21t07:07:58.601",
"node_allocation_decisions": [
{
"node_decision": "yes",
"node_id": "NbWEPxmBQn-GVvQcIlZMDg",
"node_name": "blabla-data",
"transport_address": "10...:12500",
"weight_ranking": 1
},
{
"node_decision": "yes",
"node_id": "zr6vKM7-RJmTYTiqjtQ7vw",
"node_name": "blabla-data",
"transport_address": "10...:12500",
"weight_ranking": 2
}
],
"primary": true,
"shard": 1,
"target_node": {
"id": "NbWEPxmBQn-GVvQcIlZMDg",
"name": "blabla-data",
"transport_address": "...:12500"
},
"unassigned_info": {
"at": "2017-06-21T05:07:58.612Z",
"last_allocation_status": "no",
"reason": "INDEX_CREATED"
}
}

[2017-06-21T09:52:27,820][DEBUG][o.e.a.s.TransportSearchAction] [nodedata] All shards failed for phase: [query]
org.elasticsearch.action.NoShardAvailableActionException: null
at org.elasticsearch.action.search.AbstractSearchAsyncAction.start(AbstractSearchAsyncAction.java:115) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:154) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:53) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:173) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:145) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:64) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:54) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1488) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:109) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1445) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1329) [elasticsearch-5.2.0.jar:5.2.0]
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74) [transport-netty4-5.2.0.jar:5.2.0]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:396) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)

Xavy · June 21, 2017, 8:19am

Hello:

I guess so, but, could you confirm the same message if you make a reroute API call, with rety_failed?

https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-reroute.html

system · July 19, 2017, 8:19am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch hangs at 541 shards Elasticsearch	18	2670	July 5, 2017
Bulk Index task hangs Elasticsearch	5	1206	September 7, 2018
Indices recovering after a red - yellow state leads to writes stucked? Elasticsearch	2	311	December 28, 2021
Cluster red, unassigned shards, no response on writes Elasticsearch	17	691	February 22, 2023
Java application using BulkProcessing hangs if elasticsearch hangs Elasticsearch	9	4571	July 5, 2017

Disk got full and the shards never seemed to be recovered?

Related topics