Had a power outage that took down a small (one node, 2 shards, two replicas) locally hosted elastic stack (was using sebp/elk 7.1 at the time, since upgraded to 7.2).
Bringing the stack back online resulted in Kibana complaining that it couldn't load any data from my index. Digging around, I found that both of my primaries were in the UNASSIGNED ALLOCATION_FAILED
state.
I used the reroute
API with allocate_stale_primary
but was unsuccessful.
I tried running the elasticsearch-shard
tool inside the container, but it gave me a nullPointerException, so on the advice of Dimitrios Liappis from the elastic repo, I tried with the official elastic docker image.
This time, I get the following output:
[elasticsearch@51a35514f16e ~]$ bin/elasticsearch-shard remove-corrupted-data --index environment --shard-id 1
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
-----------------------------------------------------------------------
WARNING: Elasticsearch MUST be stopped before running this tool.
Please make a complete backup of your index before using this tool.
-----------------------------------------------------------------------
Opening Lucene index at /usr/share/elasticsearch/data/nodes/0/indices/8WitgzPDQvCiYJcnnli1sQ/1/index
>> Lucene index is clean at /usr/share/elasticsearch/data/nodes/0/indices/8WitgzPDQvCiYJcnnli1sQ/1/index
Opening translog at /usr/share/elasticsearch/data/nodes/0/indices/8WitgzPDQvCiYJcnnli1sQ/1/translog
read past EOF. pos [51026] length: [4] end: [51026]
Exception in thread "main" java.io.EOFException: read past EOF. pos [51026] length: [4] end: [51026]
at org.elasticsearch.common.io.Channels.readFromFileChannelWithEofException(Channels.java:103)
at org.elasticsearch.index.translog.TranslogSnapshot.readBytes(TranslogSnapshot.java:104)
at org.elasticsearch.index.translog.BaseTranslogReader.readSize(BaseTranslogReader.java:79)
at org.elasticsearch.index.translog.TranslogSnapshot.readOperation(TranslogSnapshot.java:80)
at org.elasticsearch.index.translog.TranslogSnapshot.next(TranslogSnapshot.java:70)
at org.elasticsearch.index.translog.MultiSnapshot.next(MultiSnapshot.java:69)
at org.elasticsearch.index.translog.TruncateTranslogAction.isTranslogClean(TruncateTranslogAction.java:187)
at org.elasticsearch.index.translog.TruncateTranslogAction.getCleanStatus(TruncateTranslogAction.java:86)
at org.elasticsearch.index.shard.RemoveCorruptedShardDataCommand.lambda$execute$1(RemoveCorruptedShardDataCommand.java:329)
at org.elasticsearch.index.shard.RemoveCorruptedShardDataCommand.findAndProcessShardPath(RemoveCorruptedShardDataCommand.java:202)
at org.elasticsearch.index.shard.RemoveCorruptedShardDataCommand.execute(RemoveCorruptedShardDataCommand.java:282)
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124)
at org.elasticsearch.cli.MultiCommand.execute(MultiCommand.java:77)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124)
at org.elasticsearch.cli.Command.main(Command.java:90)
at org.elasticsearch.index.shard.ShardToolCli.main(ShardToolCli.java:35)
I'm not overly concerned with losing some incremental data, but I would like to recover the 12 months of data already stored in the cluster, AND begin to capture data again.
Unfortunately, my snapshots were misconfigured, and the last one was taken the first time I restarted the cluster (about a year ago ).
So... Does anyone have a suggestion as to how I can get back up and running again?
Thanks so much!
- Craig