Shard stuck in STORE TRANSLOG stage

Omar_Al_Zabir · December 29, 2015, 2:58pm

My topbeat index has two shards 0, 1 which are both stuck at TRANSLOG stage for hours. CLuster state is red. Please help me troubleshoot this problem:

"id": 1,
  "type": "STORE",
  "stage": "TRANSLOG",
  "primary": true,
  "start_time": "2015-12-29T13:03:04.160Z",
  "start_time_in_millis": 1451394184160,
  "total_time": "1.9h",
  "total_time_in_millis": 6847526,
  "source": {
    "id": "RWxvylvLQcSF4T3RM6xG8A",
    "host": "10.35.132.142",
    "transport_address": "10.35.132.142:9300",
    "ip": "10.35.132.142",
    "name": "ec-dyl09026app03"
  },
  "target": {
    "id": "RWxvylvLQcSF4T3RM6xG8A",
    "host": "10.35.132.142",
    "transport_address": "10.35.132.142:9300",
    "ip": "10.35.132.142",
    "name": "ec-dyl09026app03"
  },
  "index": {
    "size": {
      "total": "296.9mb",
      "total_in_bytes": 311398655,
      "reused": "296.9mb",
      "reused_in_bytes": 311398655,
      "recovered": "0b",
      "recovered_in_bytes": 0,
      "percent": "100.0%"
    },
    "files": {
      "total": 88,
      "reused": 88,
      "recovered": 0,
      "percent": "100.0%"
    },
    "total_time": "26ms",
    "total_time_in_millis": 26,
    "source_throttle_time": "-1",
    "source_throttle_time_in_millis": 0,
    "target_throttle_time": "-1",
    "target_throttle_time_in_millis": 0
  },
  "translog": {
    "recovered": 379391,
    "total": -1,
    "percent": "-1.0%",
    "total_on_start": -1,
    "total_time": "1.9h",
    "total_time_in_millis": 6847500
  },
  "verify_index": {
    "check_index_time": "0s",
    "check_index_time_in_millis": 0,
    "total_time": "0s",
    "total_time_in_millis": 0
  }
}

warkolm · December 29, 2015, 11:16pm

Check your ES logs.

What version are you on?

Omar_Al_Zabir · December 30, 2015, 10:29am

ES 2.1.0.

I tried many things:

Tried closing the index and reopening, no luck.
Tried deleting index and recreating, no luck. As soon as new shards are created, they get stuck.
Various shards started going into unassigned mode and never getting assigned, despite many hours.
I noticed the EC process consumed 100% CPU continuously for hours. So, they must have gone into some infinite loop.
New indexes started getting into STORE mode.
Tried rolling restart of EC nodes, did not help.
Tried stopping all EC nodes and starting them all in one shot. Did not help either.

All these started happening, when I tried to close a couple of old indices.

After many hours of struggle, I finally deleted the data folder. Now everything is running fine.

If you are interested, I can give you the logs. It should reveal issues with bulk closing of index causing the cluster to get into irrecoverable state.

warkolm · January 1, 2016, 3:52am

If you kept track of everything it may be worth raising an issue on Github.

Topic		Replies	Views
ES 2.3.5, shard stuck in Translog stage Elasticsearch	2	525	May 12, 2017
Translog recovery stuck[ES 6.0] Elasticsearch	5	1282	September 30, 2019
One of the Shard stuck at INITIALIZING state Elasticsearch	4	1961	July 5, 2017
ES 2.1 shards stuck in translog recovery Elasticsearch	14	6024	July 5, 2017
Shard stuck in INITIALIZING state Elasticsearch	2	14290	June 17, 2017

Shard stuck in STORE TRANSLOG stage

Related topics