Здравствуйте!
У меня есть кластер из трех нод. Работает он стабильно, но при рестарте одной из трех нод кластера стабильно получаю такое в логах:
[2017-02-07T10:47:46,977][INFO ][o.e.n.Node ] [Shinra tensei] initializing ...
[2017-02-07T10:47:47,199][INFO ][o.e.e.NodeEnvironment ] [Shinra tensei] using [1] data paths, mounts [[/var/lib/elasticsearch (/dev/sdd1)]], net usable_space [35.6gb], net total_space [366.6gb], spins? [no], types [ext4]
[2017-02-07T10:47:47,199][INFO ][o.e.e.NodeEnvironment ] [Shinra tensei] heap size [15.8gb], compressed ordinary object pointers [true]
[2017-02-07T10:47:49,498][INFO ][o.e.n.Node ] [Shinra tensei] node name [Shinra tensei], node ID [Wk61JWECRhuqia2Cjv7I-w]
[2017-02-07T10:47:49,514][INFO ][o.e.n.Node ] [Shinra tensei] version[5.2.0], pid[10338], build[24e05b9/2017-01-24T19:52:35.800Z], OS[Linux/3.10.0-514.6.1.el7.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_121/25.121-b13]
[2017-02-07T10:47:50,999][INFO ][o.e.p.PluginsService ] [Shinra tensei] loaded module [aggs-matrix-stats]
[2017-02-07T10:47:50,999][INFO ][o.e.p.PluginsService ] [Shinra tensei] loaded module [ingest-common]
[2017-02-07T10:47:50,999][INFO ][o.e.p.PluginsService ] [Shinra tensei] loaded module [lang-expression]
[2017-02-07T10:47:50,999][INFO ][o.e.p.PluginsService ] [Shinra tensei] loaded module [lang-groovy]
[2017-02-07T10:47:50,999][INFO ][o.e.p.PluginsService ] [Shinra tensei] loaded module [lang-mustache]
[2017-02-07T10:47:50,999][INFO ][o.e.p.PluginsService ] [Shinra tensei] loaded module [lang-painless]
[2017-02-07T10:47:51,000][INFO ][o.e.p.PluginsService ] [Shinra tensei] loaded module [percolator]
[2017-02-07T10:47:51,000][INFO ][o.e.p.PluginsService ] [Shinra tensei] loaded module [reindex]
[2017-02-07T10:47:51,000][INFO ][o.e.p.PluginsService ] [Shinra tensei] loaded module [transport-netty3]
[2017-02-07T10:47:51,000][INFO ][o.e.p.PluginsService ] [Shinra tensei] loaded module [transport-netty4]
[2017-02-07T10:47:51,001][INFO ][o.e.p.PluginsService ] [Shinra tensei] no plugins loaded
[2017-02-07T10:47:58,061][INFO ][o.e.n.Node ] [Shinra tensei] initialized
[2017-02-07T10:47:58,061][INFO ][o.e.n.Node ] [Shinra tensei] starting ...
[2017-02-07T10:47:58,352][INFO ][o.e.t.TransportService ] [Shinra tensei] publish_address {10.1.20.2:9300}, bound_addresses {10.1.20.2:9300}
[2017-02-07T10:47:58,360][INFO ][o.e.b.BootstrapChecks ] [Shinra tensei] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-02-07T10:48:10,987][INFO ][o.e.c.s.ClusterService ] [Shinra tensei] detected_master {Amaterasu}{88brOFyATkO-8l9-6kIXbQ}{1j7DNoG2QUi80yubyf_h1g}{10.2.20.2}{10.2.20.2:9300}, added {{Amaterasu}{88brOFyATkO-8l9-6kIXbQ}{1j7DNoG2QUi80yubyf_h1g}{10.2.20.2}{10.2.20.2:9300},}, reason: zen-disco-receive(from master [master {Amaterasu}{88brOFyATkO-8l9-6kIXbQ}{1j7DNoG2QUi80yubyf_h1g}{10.2.20.2}{10.2.20.2:9300} committed version [21]])
[2017-02-07T10:48:10,992][INFO ][o.e.c.s.ClusterSettings ] [Shinra tensei] updating [indices.breaker.fielddata.limit] from [60%] to [80%]
[2017-02-07T10:48:28,394][WARN ][o.e.n.Node ] [Shinra tensei] timed out while waiting for initial discovery state - timeout: 30s
[2017-02-07T10:48:28,406][INFO ][o.e.h.HttpServer ] [Shinra tensei] publish_address {10.1.20.2:9200}, bound_addresses {10.1.20.2:9200}
[2017-02-07T10:48:28,407][INFO ][o.e.n.Node ] [Shinra tensei] started
[2017-02-07T10:48:33,914][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [Shinra tensei] no known master node, scheduling a retry
[2017-02-07T10:48:33,915][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [Shinra tensei] no known master node, scheduling a retry
[2017-02-07T10:48:33,915][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [Shinra tensei] no known master node, scheduling a retry
[2017-02-07T10:48:33,915][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [Shinra tensei] no known master node, scheduling a retry
[2017-02-07T10:48:33,915][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [Shinra tensei] no known master node, scheduling a retry
[2017-02-07T10:48:35,678][INFO ][o.e.d.z.ZenDiscovery ] [Shinra tensei] failed to send join request to master [{Amaterasu}{88brOFyATkO-8l9-6kIXbQ}{1j7DNoG2QUi80yubyf_h1g}{10.2.20.2}{10.2.20.2:9300}], reason [NodeDisconnectedException[[Amaterasu][10.2.20.2:9300][internal:discovery/zen/join] disconnected]]
[2017-02-07T10:48:35,682][INFO ][o.e.d.z.ZenDiscovery ] [Shinra tensei] master_left [{Amaterasu}{88brOFyATkO-8l9-6kIXbQ}{1j7DNoG2QUi80yubyf_h1g}{10.2.20.2}{10.2.20.2:9300}], reason [transport disconnected]
[2017-02-07T10:48:35,781][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [Shinra tensei] no known master node, scheduling a retry
[2017-02-07T10:48:35,781][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [Shinra tensei] no known master node, scheduling a retry
[2017-02-07T10:48:35,782][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [Shinra tensei] no known master node, scheduling a retry
[2017-02-07T10:48:35,782][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [Shinra tensei] no known master node, scheduling a retry
[2017-02-07T10:48:35,782][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [Shinra tensei] no known master node, scheduling a retry
[2017-02-07T10:48:38,084][WARN ][o.e.c.NodeConnectionsService] [Shinra tensei] failed to connect to node {Amaterasu}{88brOFyATkO-8l9-6kIXbQ}{1j7DNoG2QUi80yubyf_h1g}{10.2.20.2}{10.2.20.2:9300} (tried [1] times)
Помогает только рестарт всех нод в случайном порядке.
Версии везде 5.2. ОС Centos 7.3
Конфиг на всех нодах выглядит так(отличаются только имена нод и ip-адреса в дискавери):
cluster.name: nocstat
node.name: "Edo tensei"
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch/
bootstrap.memory_lock: true
network.host: "_eno1:ipv4_"
http.port: 9200
discovery.zen.ping.unicast.hosts: [ "10.2.20.2", "10.1.20.2"]
discovery.zen.minimum_master_nodes: 2
gateway.recover_after_nodes: 2
node.max_local_storage_nodes: 1
action.destructive_requires_name: false
thread_pool.search.size: 25
thread_pool.search.queue_size: 3000
thread_pool.index.size: 16
thread_pool.index.queue_size: 200
thread_pool.bulk.queue_size: 10000
indices.memory.index_buffer_size: 20%
indices.store.throttle.max_bytes_per_sec: 150mb
cluster.routing.allocation.node_initial_primaries_recoveries: 140
cluster.routing.allocation.node_concurrent_recoveries: 400
indices.recovery.max_bytes_per_sec: 100mb
Подскажите, пожалуйста, что я делаю не так? В какую сторону копать? Что посмотреть?
Спасибо!