Failed shards + lost of kibana data


(szemlyanoy) #1

Hi all,

I faced very annoying problem.
Elasticsearch crashed, all shards became in UNASSIGNED state.

Errors in log

2016-01-27 13:23:27,101DEBUGaction.search.type elk-ID1 All shards failed for phase: query

RemoteTransportException[elk-ID1127.0.0.1:9300[indices:data/read/searchphase/query]]; nested: IllegalIndexSha

rdStateException[CurrentStateRECOVERING operations only allowed when shard state is one of [POST_RECOVERY, STARTE

D, RELOCATED]];

Caused by: logstash-2016.01.27[logstash-2016.01.273] IllegalIndexShardStateException[CurrentStateRECOVERING

operations only allowed when shard state is one of POST_RECOVERY, STARTED, RELOCATED]

    at org.elasticsearch.index.shard.IndexShard.readAllowed(IndexShard.java:974)

    at org.elasticsearch.index.shard.IndexShard.acquireSearcher(IndexShard.java:808)

    at org.elasticsearch.search.SearchService.createContext(SearchService.java:640)

    at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:617)

    at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:368)

    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived

(SearchServiceTransportAction.java:368)

    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived

(SearchServiceTransportAction.java:365)

    at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:350)

    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

    at java.lang.Thread.run(Thread.java:745)

I recovered shards by posting query

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands" : [ {
"allocate" : {
"index" : "index",
"shard" : shard,
"node" : "127.0.0.1",
"allow_primary" : true}}]}'
sleep 3

All shards seems recovered but I unexpectedly lost all data in Kibana and .kibana shard is still UNASSIGNED. It happened twice per last day.

IS that some well-known issue?

BR,
Sergey


(Mark Walkom) #2

Forcing primary shard allocation will cause data loss, see here.

What version of ES are you on?


(szemlyanoy) #3

Version : 2.1.1
Release : 1

Today again elasticsearch failed, recent(today) index failed and kibana again lost its dashboards

[2016-01-29 09:40:54,742][DEBUG][action.admin.indices.stats] [elk-ID1] [indices:monitor/stats] failed to execute operation for shard [[logstash-2016.01.29][3], node[hFTc1KGEQOO3lZMAYIOIaA], [P], v[3], s[INITIALIZING], a[id=LNjkbhGXSU-DwizjhEi0aA], unassigned_info[[reason=CLUSTER_RECOVERED], at[2016-01-29T08:37:49.831Z]]]
[logstash-2016.01.29][[logstash-2016.01.29][3]] BroadcastShardOperationFailedException[operation indices:monitor/stats failed]; nested: IllegalIndexShardStateException[CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED, RELOCATED]];
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:405)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:382)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:371)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:350)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: [logstash-2016.01.29][[logstash-2016.01.29][3]] IllegalIndexShardStateException[CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED, RELOCATED]]
at org.elasticsearch.index.shard.IndexShard.readAllowed(IndexShard.java:974)
at org.elasticsearch.index.shard.IndexShard.acquireSearcher(IndexShard.java:808)
at org.elasticsearch.index.shard.IndexShard.docStats(IndexShard.java:628)
at org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:131)
at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:165)
at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:47)
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:401)
... 7 more


(szemlyanoy) #4

And another question - how can I backup Kibana dashboards' stuff on filesystem level to be safe in case of such failures?

Thnx


(szemlyanoy) #5

Unfortunately I was forced to remove .kibana index as it failed to start, I lost all my dashboards stuff and not sure I would avoid this issue again.
Please give some advise what a is going on with elasticsearch in my case?

BR
Sergey


(David Pilato) #6

What I did in that case, I opened the .kibana index with an older version of elasticsearch, used the elasticsearch-knapsack plugin to export .kibana docs to disk, then I started a completely new instance of elasticsearch 2.1.1, started kibana, and then import .kibana index again from disk.

Not sure if it's ideal but at least I was able to get back my dashboards.


(David Pilato) #7

Oh I misread the thread. I was not hitting the same issue as you got. Was a mapping issue in my case.

Feel free to ignore my comment...


(szemlyanoy) #8

So any ideas on this? Indexes keep crashing which is pretty annoying

Thanks
Sergey


(Mark Walkom) #9

I'd suggest you upgrade to latest 2.1 and see if that help.


(szemlyanoy) #10

So you mean downgrade since I'm running 2.2.0 ?


(Mark Walkom) #11

Ahh, well you mentioned 2.1.1 previously :slight_smile:


(szemlyanoy) #12

Ah yes, sorry, but it was automatically upgraded to 2.2.0
So I was forced to recreate all indexes, since only test data was stored there now.

Also would like to understand how to backup kibana stuff namely searches, visualizes, dashboards?


(Mark Walkom) #13

You can use snapshot + restore, or just export everything via KB manually.


(system) #14