Hi all,
We've encountered a data corruption in our main production elasticsearch instance. We started getting the following response when querying the corrupted index:
{"error":{"root_cause":[{"type":"no_shard_available_action_exception","reason":"No shard available for [get [hotels3][points][4185]: routing [null]]"}],"type":"no_shard_available_action_exception","reason":"No shard available for [get [hotels3][points][4185]: routing [null]]"},"status":503}
_cat returns the following state (we didn't change any settings):
hotels3 0 p UNASSIGNED
hotels3 0 r UNASSIGNED
replicated cluster instances got populated with the same error and lost all data which was also a big hit.
restarting elasticsearch did not recover the indices.
I saw many topics with the same error however none had a reason that applied to us or a solution that worked for us.
- what could cause this issue and how to prevent it
- how to fix the indices
- how to prevent the replicated cluster nodes from getting corrupted as well?
Any help will be greatly appreciated.
Thank you,
Ami
cluster logs:
[2017-10-23T09:57:22,773][DEBUG][o.e.a.s.TransportSearchAction] [escl01] All shards failed for phase: [query]
org.elasticsearch.action.NoShardAvailableActionException: null
at org.elasticsearch.action.search.AbstractSearchAsyncAction.start(AbstractSearchAsyncAction.java:122) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.action.search.TransportSearchAction.executeSearch(TransportSearchAction.java:240) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:146) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:67) ~
...
...
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:441) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.7.Final.jar:4.1.7.Final]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
[2017-10-23T09:57:22,777][WARN ][r.suppressed ] path: /hotels3/_count, params: {index=hotels3}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onInitialPhaseResult(AbstractSearchAsyncAction.java:223) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.start(AbstractSearchAsyncAction.java:122) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.action.search.TransportSearchAction.executeSearch(TransportSearchAction.java:240) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:146) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:67) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:170) ~[elasticsearch-5.3.0.jar:5.3.0]
...
...
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:481) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:441) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.7.Final.jar:4.1.7.Final]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
Caused by: org.elasticsearch.action.NoShardAvailableActionException
... 58 more