ES version : 5.6.5
OS: CentOS 7
突然、index作成時にshardが自動allocateされなくなりました。
ログにはERRORは出ておらずWARNのみ。
[2017-12-14T15:45:27,714][WARN ][rest.suppressed ] path: /test-a/test/_search, params: {index=test-a, type=test}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:272) ~[elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:130) ~[elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:241) ~[elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:107) ~[elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.action.search.InitialSearchPhase.lambda$performPhaseOnShard$4(InitialSearchPhase.java:205) ~[elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.action.search.InitialSearchPhase$1.doRun(InitialSearchPhase.java:184) [elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.6.5.jar:5.6.5]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144].
[2017-12-14T15:47:07,168][WARN ][rest.suppressed ] path: /test-a/_optimize, params: {index=test-a, type=_optimize}
org.elasticsearch.action.UnavailableShardsException: [test-a][4] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[test-a][4]] containing [index {[test-a][_optimize][AWBTxehA5yvZ8feaMwFL], source[_na_]}]]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryBecauseUnavailable(TransportReplicationAction.java:892) [elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryIfUnavailable(TransportReplicationAction.java:728) [elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:681) [elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onTimeout(TransportReplicationAction.java:846) [elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:311) [elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:238) [elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:1056) [elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.5.jar:5.6.5]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
再起動するとallocateが動くようになりますが、shard4だけはallocateされない状態です。
再起動時に以下のINFOがでていました。
[2017-12-14T15:53:03,946][INFO ][org.elasticsearch.node.Node] initialized
[2017-12-14T15:53:03,947][INFO ][org.elasticsearch.node.Node] starting ...
[2017-12-14T15:53:04,081][INFO ][org.elasticsearch.transport.TransportService] publish_address {xxx.xxx.xxx.xxx:9300}, bound_addresses {0.0.0.0:9300}
[2017-12-14T15:53:04,096][INFO ][org.elasticsearch.bootstrap.BootstrapChecks] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-12-14T15:53:07,147][INFO ][org.elasticsearch.cluster.service.ClusterService] new_master {Trv01-9300}{jyBEH2kqT_C0tpHHxIbWvg}{TK_KjL8GS4itkGcANep3GQ}{xxx.xxx.xxx.xxx}{xxx.xxx.xxx.xxx:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-12-14T15:53:07,161][INFO ][org.elasticsearch.http.netty4.Netty4HttpServerTransport] publish_address {xxx.xxx.xxx.xxx:9200}, bound_addresses {0.0.0.0:9200}
[2017-12-14T15:53:07,161][INFO ][org.elasticsearch.node.Node] started
[2017-12-14T15:53:07,302][INFO ][org.elasticsearch.common.settings.ClusterSettings] updating [cluster.routing.allocation.enable] from [ALL] to [none] <-- ★なぜかES側でupdateされており、この後にES側でallに戻したINFOは出ていない。
[2017-12-14T15:53:07,362][INFO ][org.elasticsearch.gateway.GatewayService] recovered [2] indices into cluster_state
手動でALLで設定しなおすと正常にもどりましたが、挙動がおかしいようにおもいます。
ちなみに5.6.3のときは問題ありませんでした。
elasticsearch% curl -XPUT 'localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d' { "transient": { "cluster.routing.allocation.enable": "all" } } '
elasticsearch% curl -XPUT 'localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d' { "transient": { "cluster.routing.rebalance.enable": "all" } } '