Failed shards + lost of kibana data

Hi all,

I faced very annoying problem.
Elasticsearch crashed, all shards became in UNASSIGNED state.

Errors in log

2016-01-27 13:23:27,101DEBUGaction.search.type elk-ID1 All shards failed for phase: query

RemoteTransportException[elk-ID1127.0.0.1:9300[indices:data/read/searchphase/query]]; nested: IllegalIndexSha

rdStateException[CurrentStateRECOVERING operations only allowed when shard state is one of [POST_RECOVERY, STARTE

D, RELOCATED]];

Caused by: logstash-2016.01.27[logstash-2016.01.273] IllegalIndexShardStateException[CurrentStateRECOVERING

operations only allowed when shard state is one of POST_RECOVERY, STARTED, RELOCATED]

    at org.elasticsearch.index.shard.IndexShard.readAllowed(IndexShard.java:974)

    at org.elasticsearch.index.shard.IndexShard.acquireSearcher(IndexShard.java:808)

    at org.elasticsearch.search.SearchService.createContext(SearchService.java:640)

    at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:617)

    at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:368)

    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived

(SearchServiceTransportAction.java:368)

    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived

(SearchServiceTransportAction.java:365)

    at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:350)

    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

    at java.lang.Thread.run(Thread.java:745)

I recovered shards by posting query

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands" : [ {
"allocate" : {
"index" : "index",
"shard" : shard,
"node" : "127.0.0.1",
"allow_primary" : true}}]}'
sleep 3

All shards seems recovered but I unexpectedly lost all data in Kibana and .kibana shard is still UNASSIGNED. It happened twice per last day.

IS that some well-known issue?

BR,
Sergey

Forcing primary shard allocation will cause data loss, see here.

What version of ES are you on?

Version : 2.1.1
Release : 1

Today again elasticsearch failed, recent(today) index failed and kibana again lost its dashboards

[2016-01-29 09:40:54,742][DEBUG][action.admin.indices.stats] [elk-ID1] [indices:monitor/stats] failed to execute operation for shard [[logstash-2016.01.29][3], node[hFTc1KGEQOO3lZMAYIOIaA], [P], v[3], s[INITIALIZING], a[id=LNjkbhGXSU-DwizjhEi0aA], unassigned_info[[reason=CLUSTER_RECOVERED], at[2016-01-29T08:37:49.831Z]]]
[logstash-2016.01.29][[logstash-2016.01.29][3]] BroadcastShardOperationFailedException[operation indices:monitor/stats failed]; nested: IllegalIndexShardStateException[CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED, RELOCATED]];
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:405)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:382)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:371)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:350)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: [logstash-2016.01.29][[logstash-2016.01.29][3]] IllegalIndexShardStateException[CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED, RELOCATED]]
at org.elasticsearch.index.shard.IndexShard.readAllowed(IndexShard.java:974)
at org.elasticsearch.index.shard.IndexShard.acquireSearcher(IndexShard.java:808)
at org.elasticsearch.index.shard.IndexShard.docStats(IndexShard.java:628)
at org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:131)
at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:165)
at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:47)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:401)
... 7 more

And another question - how can I backup Kibana dashboards' stuff on filesystem level to be safe in case of such failures?

Thnx

Unfortunately I was forced to remove .kibana index as it failed to start, I lost all my dashboards stuff and not sure I would avoid this issue again.
Please give some advise what a is going on with elasticsearch in my case?

BR
Sergey

What I did in that case, I opened the .kibana index with an older version of elasticsearch, used the elasticsearch-knapsack plugin to export .kibana docs to disk, then I started a completely new instance of elasticsearch 2.1.1, started kibana, and then import .kibana index again from disk.

Not sure if it's ideal but at least I was able to get back my dashboards.

Oh I misread the thread. I was not hitting the same issue as you got. Was a mapping issue in my case.

Feel free to ignore my comment...

So any ideas on this? Indexes keep crashing which is pretty annoying

Thanks
Sergey

I'd suggest you upgrade to latest 2.1 and see if that help.

So you mean downgrade since I'm running 2.2.0 ?

Ahh, well you mentioned 2.1.1 previously :slight_smile:

Ah yes, sorry, but it was automatically upgraded to 2.2.0
So I was forced to recreate all indexes, since only test data was stored there now.

Also would like to understand how to backup kibana stuff namely searches, visualizes, dashboards?

You can use snapshot + restore, or just export everything via KB manually.