Upgrade gone bad


(bigfootxxl) #1

I upgraded from 1.5.2 to 2.4.1 and now cluster is red and I have this looping in debug log between three shards it seems. One of them is this:

[2016-11-01 16:26:09,264][WARN ][cluster.action.shard ] [company] [logstash-2015.05.06][3] received shard failed for target shard [[logstash-2015.05.06][3], node[fbFMhKcSQUSPVdKv40FxfA], [P], v[3484], s[INITIALIZING], a[id=1JoG3SCMQcCQZWX_j-KsUQ], unassigned_info[[reason=ALLOCATION_FAILED], at[2016-11-01T14:26:08.684Z], details[failed recovery, failure IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[translog.ckp file already present, translog is already upgraded]; ]]], indexUUID [YV93FwQTTk-pEFhsz2GgCg], message [failed recovery], failure [IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[translog.ckp file already present, translog is already upgraded]; ]
[logstash-2015.05.06][[logstash-2015.05.06][3]] IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[translog.ckp file already present, translog is already upgraded];
at org.elasticsearch.index.shard.StoreRecoveryService$1.run(StoreRecoveryService.java:179)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: translog.ckp file already present, translog is already upgraded

Can I fix it?


(Mark Walkom) #2

You can try moving the file it refers to aside. But that's no guarantee it'll work and recover.


(bigfootxxl) #3

No that did not help:

[2016-11-02 11:42:26,740][DEBUG][index.shard ] [hostname] [logstash-2015.05.06][3] state: [CREATED]->[RECOVERING], reason [from store]
[2016-11-02 11:42:26,752][DEBUG][index.shard ] [hostname] [logstash-2015.05.06][3] updateBufferSize: engine is closed; skipping
[2016-11-02 11:42:26,756][DEBUG][index.shard ] [hostname] [logstash-2015.05.06][3] starting recovery from shard_store ...
[2016-11-02 11:42:26,764][DEBUG][index.shard ] [hostname] [logstash-2015.05.06][3] updateBufferSize: engine is closed; skipping
[2016-11-02 11:42:26,769][DEBUG][index.engine ] [hostname] [logstash-2015.05.06][3] upgrading translog - no checkpoint found
[2016-11-02 11:42:26,783][DEBUG][index ] [hostname] [logstash-2015.05.06] [3] closing... (reason: [failed recovery])
[2016-11-02 11:42:26,783][DEBUG][index.shard ] [hostname] [logstash-2015.05.06][3] state: [RECOVERING]->[CLOSED], reason [failed recovery]
[2016-11-02 11:42:26,783][DEBUG][index.shard ] [hostname] [logstash-2015.05.06][3] operations counter reached 0, will not accept any further writes
[2016-11-02 11:42:26,783][DEBUG][index.store ] [hostname] [logstash-2015.05.06][3] store reference count on close: 0
[2016-11-02 11:42:26,783][DEBUG][index ] [hostname] [logstash-2015.05.06] [3] closed (reason: [failed recovery])
[2016-11-02 11:42:26,783][WARN ][indices.cluster ] [hostname] [[logstash-2015.05.06][3]] marking and sending shard failed due to [failed recovery]
[logstash-2015.05.06][[logstash-2015.05.06][3]] IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[non-legacy translog file [translog-1430859600187.tlog] found on a translog that wasn't upgraded yet];
[2016-11-02 11:42:26,783][WARN ][cluster.action.shard ] [hostname] [logstash-2015.05.06][3] received shard failed for target shard [[logstash-2015.05.06][3], node[fbFMhKcSQUSPVdKv40FxfA], [P], v[61485], s[INITIALIZING], a[id=G8c6D02LS8u4VCpajK7o7Q], unassigned_info[[reason=ALLOCATION_FAILED], at[2016-11-02T09:42:25.259Z], details[failed recovery, failure IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[non-legacy translog file [translog-1430859600187.tlog] found on a translog that wasn't upgraded yet]; ]]], indexUUID [YV93FwQTTk-pEFhsz2GgCg], message [failed recovery], failure [IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[non-legacy translog file [translog-1430859600187.tlog] found on a translog that wasn't upgraded yet]; ]
[logstash-2015.05.06][[logstash-2015.05.06][3]] IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[non-legacy translog file [translog-1430859600187.tlog] found on a translog that wasn't upgraded yet];
[2016-11-02 11:42:26,822][DEBUG][cluster.service ] [hostname] processing [shard-failed ([logstash-2015.05.04][2], node[fbFMhKcSQUSPVdKv40FxfA], [P], v[61485], s[INITIALIZING], a[id=S2rKXaqIS9Ogiepeb3Ui3A], unassigned_info[[reason=ALLOCATION_FAILED], at[2016-11-02T09:42:25.259Z], details[failed recovery, failure IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[non-legacy translog file [translog-1430686800112.tlog] found on a translog that wasn't upgraded yet]; ]]), message [failed recovery],shard-failed ([logstash-2015.05.06][3], node[fbFMhKcSQUSPVdKv40FxfA], [P], v[61485], s[INITIALIZING], a[id=G8c6D02LS8u4VCpajK7o7Q], unassigned_info[[reason=ALLOCATION_FAILED], at[2016-11-02T09:42:25.259Z], details[failed recovery, failure IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[non-legacy translog file [translog-1430859600187.tlog] found on a translog that wasn't upgraded yet]; ]]), message [failed recovery]]: execute
[2016-11-02 11:42:26,829][DEBUG][cluster.routing.allocation] [hostname] [logstash-2015.05.06][3] failed shard [logstash-2015.05.06][3], node[fbFMhKcSQUSPVdKv40FxfA], [P], v[61485], s[INITIALIZING], a[id=G8c6D02LS8u4VCpajK7o7Q], unassigned_info[[reason=ALLOCATION_FAILED], at[2016-11-02T09:42:25.259Z], details[failed recovery, failure IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[non-legacy translog file [translog-1430859600187.tlog] found on a translog that wasn't upgraded yet]; ]] found in routingNodes, failing it ([reason=ALLOCATION_FAILED], at[2016-11-02T09:42:26.829Z], details[failed recovery, failure IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[non-legacy translog file [translog-1430859600187.tlog] found on a translog that wasn't upgraded yet]; ])
[2016-11-02 11:42:26,927][DEBUG][gateway ] [hostname] [logstash-2015.05.06][3] loaded data path [/var/lib/elasticsearch/elasticsearch/nodes/0/indices/logstash-2015.05.06/3], state path [/var/lib/elasticsearch/elasticsearch/nodes/0/indices/logstash-2015.05.06/3]


(Mark Walkom) #4

How important is the data in the index?


(bigfootxxl) #5

well... I can afford loosing that piece ([logstash-2015.05.06][3]). would be nice to retain the other 4 out of 5 that are ok. is that possible?


(bigfootxxl) #6

For everybody who reads this - in the end the shards were not corrupt, with kopf I was able to see that they were unassigned and assigning them to my only server fixed everything!


(system) #7