Hi, My environment is two physical machine, both running in docker-compose.
I use OpenvSwitch
and pipework
to build Elasticsearch cluster
, Elasticsearch cluster with docker and have two nodes.
When docker container start, cluster status is green.
curl localhost:9200/_cluster/health?pretty
{
"cluster_name" : "prod_es_cluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
But, when I put some testing data like this.
curl -XPUT localhost:9200/1/2/3 -d '{"test":"data"}'
and the cluster health status goes to yellow.
curl -XPUT localhost:9200/_cluster/health?pretty
{
"cluster_name" : "prod_es_cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 4,
"unassigned_shards" : 1,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 50.0
}
I got some [WARN] information
[2017-11-21T02:10:16,560][INFO ][o.e.n.Node ] [prod_es_node1] starting ...
[2017-11-21T02:10:16,762][INFO ][o.e.t.TransportService ] [prod_es_node1] publish_address {192.168.2.11:9300}, bound_addresses {192.168.2.11:9300}
[2017-11-21T02:10:16,777][INFO ][o.e.b.BootstrapChecks ] [prod_es_node1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-11-21T02:10:46,805][WARN ][o.e.n.Node ] [prod_es_node1] timed out while waiting for initial discovery state - timeout: 30s
[2017-11-21T02:10:46,813][INFO ][o.e.h.n.Netty4HttpServerTransport] [prod_es_node1] publish_address {192.168.2.11:9200}, bound_addresses {0.0.0.0:9200}
[2017-11-21T02:10:46,815][INFO ][o.e.n.Node ] [prod_es_node1] started
[2017-11-21T02:10:49,106][INFO ][o.e.c.s.ClusterService ] [prod_es_node1] new_master {prod_es_node1}{53lFMf5gRw66S77NKzfixA}{ucq1nvnjTCu6PYVfVCmV9A}{192.168.2.11}{192.168.2.11:9300}, added {{prod_es_node2}{6PF4DNK0TfOUDUQP3tmprA}{2U490KQlRreTMvA-Boeqxw}{192.168.2.12}{192.168.2.12:9300},}, reason: zen-disco-elected-as-master ([1] nodes joined)[{prod_es_node2}{6PF4DNK0TfOUDUQP3tmprA}{2U490KQlRreTMvA-Boeqxw}{192.168.2.12}{192.168.2.12:9300}]
[2017-11-21T02:10:49,200][INFO ][o.e.g.GatewayService ] [prod_es_node1] recovered [0] indices into cluster_state
[2017-11-21T02:22:38,722][WARN ][o.e.d.r.RestController ] Content type detection for rest requests is deprecated. Specify the content
type using the [Content-Type] header.
[2017-11-21T02:22:38,804][INFO ][o.e.c.m.MetaDataCreateIndexService] [prod_es_node1] [1] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2017-11-21T02:22:39,213][INFO ][o.e.c.m.MetaDataMappingService] [prod_es_node1] [1/D_7-Uj-aSWuXnl76P9Vgnw] create_mapping [2]
[2017-11-21T02:22:41,953][WARN ][o.e.d.r.RestController ] Content type detection for rest requests is deprecated. Specify the content
type using the [Content-Type] header.
[2017-11-21T02:22:49,578][WARN ][o.e.d.r.RestController ] Content type detection for rest requests is deprecated. Specify the content
type using the [Content-Type] header.
After a few minutes, the cluster disconnected
[2017-11-21T02:54:49,519][WARN ][o.e.d.z.ZenDiscovery ] [prod_es_node1] not enough master nodes (has [1], but needed [2]), current n
odes: nodes:
{prod_es_node2}{6PF4DNK0TfOUDUQP3tmprA}{2U490KQlRreTMvA-Boeqxw}{192.168.2.12}{192.168.2.12:9300}
{prod_es_node1}{53lFMf5gRw66S77NKzfixA}{ucq1nvnjTCu6PYVfVCmV9A}{192.168.2.11}{192.168.2.11:9300}, local, master
[2017-11-21T02:55:47,033][WARN ][o.e.d.z.UnicastZenPing ] [prod_es_node1] failed to send ping to [{prod_es_node2}{6PF4DNK0TfOUDUQP3tmprA}{2U490KQlRreTMvA-Boeqxw}{192.168.2.12}{192.168.2.12:9300}]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [prod_es_node2][192.168.2.12:9300][internal:discovery/zen/unicast] request_id [2973] timed out after [37500ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:961) [elasticsearch-5.6.4.jar:5.6.4]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5
.6.4.jar:5.6.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2017-11-21T02:56:20,102][WARN ][o.e.d.z.ZenDiscovery ] [prod_es_node1] failed to validate incoming join request from node [{prod_es_node2}{6PF4DNK0TfOUDUQP3tmprA}{2U490KQlRreTMvA-Boeqxw}{192.168.2.12}{192.168.2.12:9300}]
org.elasticsearch.ElasticsearchTimeoutException: java.util.concurrent.TimeoutException: Timeout waiting for task.
How do I fix it?