How to fix elasticsearch cluster network connection with docker

Hi, My environment is two physical machine, both running in docker-compose.

I use OpenvSwitch and pipework to build Elasticsearch cluster, Elasticsearch cluster with docker and have two nodes.

When docker container start, cluster status is green.

curl localhost:9200/_cluster/health?pretty
{
  "cluster_name" : "prod_es_cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

But, when I put some testing data like this.

curl -XPUT localhost:9200/1/2/3 -d '{"test":"data"}'

and the cluster health status goes to yellow.

curl -XPUT localhost:9200/_cluster/health?pretty
{
  "cluster_name" : "prod_es_cluster",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 5,
  "active_shards" : 5,
  "relocating_shards" : 0,
  "initializing_shards" : 4,
  "unassigned_shards" : 1,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 50.0
}

I got some [WARN] information

[2017-11-21T02:10:16,560][INFO ][o.e.n.Node               ] [prod_es_node1] starting ...
[2017-11-21T02:10:16,762][INFO ][o.e.t.TransportService   ] [prod_es_node1] publish_address {192.168.2.11:9300}, bound_addresses {192.168.2.11:9300}
[2017-11-21T02:10:16,777][INFO ][o.e.b.BootstrapChecks    ] [prod_es_node1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-11-21T02:10:46,805][WARN ][o.e.n.Node               ] [prod_es_node1] timed out while waiting for initial discovery state - timeout: 30s
[2017-11-21T02:10:46,813][INFO ][o.e.h.n.Netty4HttpServerTransport] [prod_es_node1] publish_address {192.168.2.11:9200}, bound_addresses {0.0.0.0:9200}
[2017-11-21T02:10:46,815][INFO ][o.e.n.Node               ] [prod_es_node1] started
[2017-11-21T02:10:49,106][INFO ][o.e.c.s.ClusterService   ] [prod_es_node1] new_master {prod_es_node1}{53lFMf5gRw66S77NKzfixA}{ucq1nvnjTCu6PYVfVCmV9A}{192.168.2.11}{192.168.2.11:9300}, added {{prod_es_node2}{6PF4DNK0TfOUDUQP3tmprA}{2U490KQlRreTMvA-Boeqxw}{192.168.2.12}{192.168.2.12:9300},}, reason: zen-disco-elected-as-master ([1] nodes joined)[{prod_es_node2}{6PF4DNK0TfOUDUQP3tmprA}{2U490KQlRreTMvA-Boeqxw}{192.168.2.12}{192.168.2.12:9300}]
[2017-11-21T02:10:49,200][INFO ][o.e.g.GatewayService     ] [prod_es_node1] recovered [0] indices into cluster_state
[2017-11-21T02:22:38,722][WARN ][o.e.d.r.RestController   ] Content type detection for rest requests is deprecated. Specify the content
type using the [Content-Type] header.
[2017-11-21T02:22:38,804][INFO ][o.e.c.m.MetaDataCreateIndexService] [prod_es_node1] [1] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2017-11-21T02:22:39,213][INFO ][o.e.c.m.MetaDataMappingService] [prod_es_node1] [1/D_7-Uj-aSWuXnl76P9Vgnw] create_mapping [2]
[2017-11-21T02:22:41,953][WARN ][o.e.d.r.RestController   ] Content type detection for rest requests is deprecated. Specify the content
type using the [Content-Type] header.
[2017-11-21T02:22:49,578][WARN ][o.e.d.r.RestController   ] Content type detection for rest requests is deprecated. Specify the content
type using the [Content-Type] header.

After a few minutes, the cluster disconnected

[2017-11-21T02:54:49,519][WARN ][o.e.d.z.ZenDiscovery     ] [prod_es_node1] not enough master nodes (has [1], but needed [2]), current n
odes: nodes:
   {prod_es_node2}{6PF4DNK0TfOUDUQP3tmprA}{2U490KQlRreTMvA-Boeqxw}{192.168.2.12}{192.168.2.12:9300}
   {prod_es_node1}{53lFMf5gRw66S77NKzfixA}{ucq1nvnjTCu6PYVfVCmV9A}{192.168.2.11}{192.168.2.11:9300}, local, master

[2017-11-21T02:55:47,033][WARN ][o.e.d.z.UnicastZenPing   ] [prod_es_node1] failed to send ping to [{prod_es_node2}{6PF4DNK0TfOUDUQP3tmprA}{2U490KQlRreTMvA-Boeqxw}{192.168.2.12}{192.168.2.12:9300}]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [prod_es_node2][192.168.2.12:9300][internal:discovery/zen/unicast] request_id [2973] timed out after [37500ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:961) [elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5
.6.4.jar:5.6.4]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2017-11-21T02:56:20,102][WARN ][o.e.d.z.ZenDiscovery     ] [prod_es_node1] failed to validate incoming join request from node [{prod_es_node2}{6PF4DNK0TfOUDUQP3tmprA}{2U490KQlRreTMvA-Boeqxw}{192.168.2.12}{192.168.2.12:9300}]
org.elasticsearch.ElasticsearchTimeoutException: java.util.concurrent.TimeoutException: Timeout waiting for task.

How do I fix it?

and here is my config.

ovs:

sudo brctl addbr br0
sudo ip link set dev br0 up
sudo ovs-vsctl add-br ovs0
sudo ovs-vsctl set bridge ovs0 stp_enable=true
sudo ovs-vsctl add-port ovs0 br0
sudo ovs-vsctl add-port ovs0 gre0 -- set interface gre0 type=gre options:remote_ip=10.251.34.60

pipework:

sudo pipework-master/pipework ovs0 -i eth1 es1 192.168.2.11/24 @100

docker-compose.yml

version: '2'

services:
  elasticsearch:
    image: elasticsearch:latest
    hostname: es1
    container_name: es1
    user: elasticsearch
    working_dir: /usr/share/elasticsearch
    ports:
    - 9200:9200
    - 9300:9300
    networks:
      esbridge:
        ipv4_address: 192.168.2.11
    command: /usr/share/elasticsearch/bin/elasticsearch
    environment:
    - ES_JAVA_OPTS=-Xms2g -Xmx2g
    - bootstrap.memory_lock=true
    volumes:
    - /home/rd6-admin/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
      memlock:
        soft: -1
        hard: -1

networks:
  esbridge:
    driver: bridge
    ipam:
     config:
       - subnet: 192.168.0.0/16
         gateway: 192.168.0.1

elasticsearch.yml

#ES1
http.host: 0.0.0.0
transport.host: 192.168.2.11
transport.tcp.port: 9300
discovery.zen.minimum_master_nodes: 1

cluster.name: prod_es_cluster
node.name: prod_es_node1
node.master: true
node.data: true
discovery.zen.ping_timeout: 30s
discovery.zen.join_timeout: 60s
discovery.zen.ping.unicast.hosts: ["192.168.2.11:9300","192.168.2.12:9300"]

network.host: 192.168.2.11
network.bind_host: 0.0.0.0
network.publish_host: 0.0.0.0
bootstrap.memory_lock: true

http.port: 9200
http.enabled: true

http.cors.enabled: true
http.cors.allow-origin: "*"

I’m a bit surprised by the error message which seems to tell us that you set

discovery.zen.minimum_master_nodes: 2

But you said you have

discovery.zen.minimum_master_nodes: 1

Could you double check it?

The config is set

discovery.zen.minimum_master_nodes: 2

Now I change to

#ES1-master

http.host: 0.0.0.0
transport.host: 192.168.2.11
transport.tcp.port: 9300
discovery.zen.minimum_master_nodes: 1

cluster.name: prod_es_cluster
node.name: prod_es_node1
node.master: true
node.data: false
discovery.zen.ping_timeout: 30s
discovery.zen.join_timeout: 60s
discovery.zen.ping.unicast.hosts: ["192.168.2.11:9300"]
network.host: 192.168.2.11
network.bind_host: 0.0.0.0
network.publish_host: 192.168.2.11
bootstrap.memory_lock: true

http.port: 9200
http.enabled: true

http.cors.enabled: true
http.cors.allow-origin: "*"

#ES2-datanode

http.host: 0.0.0.0
transport.host: 192.168.2.12
transport.tcp.port: 9300
discovery.zen.minimum_master_nodes: 1

cluster.name: prod_es_cluster
node.name: prod_es_node2
node.master: false
node.data: true
discovery.zen.ping_timeout: 30s
discovery.zen.join_timeout: 60s
discovery.zen.ping.unicast.hosts: ["192.168.2.11:9300"]
network.host: 192.168.2.12
network.bind_host: 0.0.0.0
network.publish_host: 192.168.2.12
bootstrap.memory_lock: true

http.port: 9200
http.enabled: true
   
http.cors.enabled: true
http.cors.allow-origin: "*"

and logs :

#ES1-master

[2017-11-21T09:14:03,563][INFO ][o.e.c.s.ClusterService   ] [prod_es_node1] new_master {prod_es_node1}{X9o4IsgnQF6YzjVE6pnXdw}{PW9pZn9GRk-6MU_hO8PaSA}{192.168.2.11}{192.168.2.11:9300}, added {{prod_es_node2}{00oN7Z4ERPmyGpijcskegw}{1a_RjJbCTiCxB5Mbk7Mkig}{192.168.2.12}{192.168.2.12:9300},}, reason: zen-disco-elected-as-master ([1] nodes joined)[{prod_es_node2}{00oN7Z4ERPmyGpijcskegw}{1a_RjJbCTiCxB5Mbk7Mkig}{192.168.2.12}{192.168.2.12:9300}]
[2017-11-21T09:14:03,644][INFO ][o.e.g.GatewayService     ] [prod_es_node1] recovered [0] indices into cluster_state
[2017-11-21T09:14:28,434][INFO ][o.e.c.m.MetaDataCreateIndexService] [prod_es_node1] [1] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2017-11-21T09:14:28,810][INFO ][o.e.c.m.MetaDataMappingService] [prod_es_node1] [1/6S9xvLAPSM2vaN9CRqajeg] create_mapping [2]
[2017-11-21T09:17:44,900][INFO ][o.e.c.m.MetaDataCreateIndexService] [prod_es_node1] [11] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2017-11-21T09:17:44,973][INFO ][o.e.c.m.MetaDataMappingService] [prod_es_node1] [11/2wd4TM9nSB6KXKPtIPss6w] create_mapping [22]

#ES2-datanode

[2017-11-21T09:14:03,589][INFO ][o.e.c.s.ClusterService   ] [prod_es_node2] detected_master {prod_es_node1}{X9o4IsgnQF6YzjVE6pnXdw}{PW9pZn9GRk-6MU_hO8PaSA}{192.168.2.11}{192.168.2.11:9300}, added {{prod_es_node1}{X9o4IsgnQF6YzjVE6pnXdw}{PW9pZn9GRk-6MU_hO8PaSA}{192.168.2.11}{192.168.2.11:9300},}, reason: zen-disco-receive(from master [master {prod_es_node1}{X9o4IsgnQF6YzjVE6pnXdw}{PW9pZn9GRk-6MU_hO8PaSA}{192.168.2.11}{192.168.2.11:9300} committed version [1]])
[2017-11-21T09:14:28,340][WARN ][o.e.d.r.RestController   ] Content type detection for rest requests is deprecated. Specify the content
type using the [Content-Type] header.

I got similar logs at node2

curl localhost:9200/_cluster/health?pretty
{
"cluster_name" : "prod_es_cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 10,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 10,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 50.0
}

When I put some data to elasticsearch cluster, health goes to yellow.

After a few minutes, the cluster disconnected, and change health to RED

#ES1

[2017-11-21T09:46:48,779][INFO ][o.e.c.r.a.AllocationService] [prod_es_node1] Cluster health status changed from [YELLOW] to [RED] (reas
on: [{prod_es_node2}{00oN7Z4ERPmyGpijcskegw}{1a_RjJbCTiCxB5Mbk7Mkig}{192.168.2.12}{192.168.2.12:9300} transport disconnected]).
[2017-11-21T09:46:48,779][INFO ][o.e.c.s.ClusterService   ] [prod_es_node1] removed {{prod_es_node2}{00oN7Z4ERPmyGpijcskegw}{1a_RjJbCTiC
xB5Mbk7Mkig}{192.168.2.12}{192.168.2.12:9300},}, reason: zen-disco-node-failed({prod_es_node2}{00oN7Z4ERPmyGpijcskegw}{1a_RjJbCTiCxB5Mbk
7Mkig}{192.168.2.12}{192.168.2.12:9300}), reason(transport disconnected)[{prod_es_node2}{00oN7Z4ERPmyGpijcskegw}{1a_RjJbCTiCxB5Mbk7Mkig}
{192.168.2.12}{192.168.2.12:9300} transport disconnected]
[2017-11-21T09:46:48,785][INFO ][o.e.c.r.DelayedAllocationService] [prod_es_node1] scheduling reroute for delayed shards in [59.9s] (10
delayed shards)
[2017-11-21T09:48:18,960][WARN ][o.e.d.z.ZenDiscovery     ] [prod_es_node1] failed to validate incoming join request from node [{prod_es
_node2}{00oN7Z4ERPmyGpijcskegw}{1a_RjJbCTiCxB5Mbk7Mkig}{192.168.2.12}{192.168.2.12:9300}]
org.elasticsearch.ElasticsearchTimeoutException: java.util.concurrent.TimeoutException: Timeout waiting for task.

#ES2

[2017-11-21T09:17:44,893][WARN ][o.e.d.r.RestController   ] Content type detection for rest requests is deprecated. Specify the content
type using the [Content-Type] header.
[2017-11-21T09:46:48,943][INFO ][o.e.d.z.ZenDiscovery     ] [prod_es_node2] master_left [{prod_es_node1}{X9o4IsgnQF6YzjVE6pnXdw}{PW9pZn9GRk-6MU_hO8PaSA}{192.168.2.11}{192.168.2.11:9300}], reason [failed to ping, tried [3] times, each with  maximum [30s] timeout]
[2017-11-21T09:46:48,944][WARN ][o.e.d.z.ZenDiscovery     ] [prod_es_node2] master left (reason = failed to ping, tried [3] times, each
with  maximum [30s] timeout), current nodes: nodes:
   {prod_es_node1}{X9o4IsgnQF6YzjVE6pnXdw}{PW9pZn9GRk-6MU_hO8PaSA}{192.168.2.11}{192.168.2.11:9300}, master
   {prod_es_node2}{00oN7Z4ERPmyGpijcskegw}{1a_RjJbCTiCxB5Mbk7Mkig}{192.168.2.12}{192.168.2.12:9300}, local

[2017-11-21T09:48:18,949][INFO ][o.e.d.z.ZenDiscovery     ] [prod_es_node2] failed to send join request to master [{prod_es_node1}{X9o4I
sgnQF6YzjVE6pnXdw}{PW9pZn9GRk-6MU_hO8PaSA}{192.168.2.11}{192.168.2.11:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent
.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]

This is expected as you have only one data node.
Replicas can’t be allocated

Hi, dadoonet . Thanks for your reply.

I am understood that replicas can’t be allocated.

But, I would like to know what reason cause that elasticsearch cluster network failed to validate and timeout.

[WARN ][o.e.d.r.RestController   ] Content type detection for rest requests is deprecated. Specify the content type using the [Content-Type] header.

I have try to setting both of two nodes elasticsearch.yml

node.master: true
node.data: true

then

curl localhost:9200/_cluster/health?pretty
{
"cluster_name" : "prod_es_cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 10,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 4,
"unassigned_shards" : 6,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 50.0
}

Do I setting wrong config?

Apparently you changed the settings as you now have 2 data nodes?
And it seems that replicas are in the process of being allocated.

There does not seem to have anything wrong.

Content type detection for rest requests is deprecated. Specify the content type using the [Content-Type] header.

This is just because you are sending requests without providing the content type. Not related to discovery or whatever.

Do I setting wrong config?

I don’t think so.

Thanks! dadoonet.

I will find out what reason cause network connection failed and timeout.

What is the final config file you have?
Do you still see disconnections?
Could you share the full logs?

Sure!

two physical server both running docker.
host name:
10.xx.xx.50 web
10.xx.xx.60 log

container IP create by docker-compose
node1: 192.168.2.11
node2: 192.168.2.12

Here is my config.

Node1

#ES1
http.host: 0.0.0.0
transport.host: 192.168.2.11
transport.tcp.port: 9300
discovery.zen.minimum_master_nodes: 1

cluster.name: prod_es_cluster
node.name: prod_es_node1
node.master: true
node.data: true
discovery.zen.ping_timeout: 30s
discovery.zen.join_timeout: 60s
discovery.zen.ping.unicast.hosts: ["192.168.2.11:9300","192.168.2.12:9300"]

network.host: 192.168.2.11
network.bind_host: 0.0.0.0

network.publish_host: 192.168.2.11
bootstrap.memory_lock: true

http.port: 9200
http.enabled: true

http.cors.enabled: true
http.cors.allow-origin: "*"

Node2

#ES2
http.host: 0.0.0.0
transport.host: 192.168.2.12
transport.tcp.port: 9300
discovery.zen.minimum_master_nodes: 1

cluster.name: prod_es_cluster
node.name: prod_es_node2
node.master: true
node.data: true
discovery.zen.ping_timeout: 30s
discovery.zen.join_timeout: 60s
discovery.zen.ping.unicast.hosts: ["192.168.2.11:9300","192.168.2.12:9300"]

network.host: 192.168.2.12
network.bind_host: 0.0.0.0

network.publish_host: 192.168.2.12
bootstrap.memory_lock: true

http.port: 9200
http.enabled: true

http.cors.enabled: true
http.cors.allow-origin: "*"

curl test

curl web:9200/_cluster/health?pretty
{
  "cluster_name" : "prod_es_cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

when I put some testing data

ex.

curl -XPUT web:9200/23w1/332/33 -d '{"as1213ewq5df":"q3wqqweweeq"}'
curl -XPUT web:9200/rwer/2e2/33 -d '{"as4354dcxfaf":"qwe6rwexzvq"}'
curl -XPUT web:9200/2t44/55d/66 -d '{"asdsaffwverf":"qw234ezvxvq"}'
curl -XPUT web:9200/2ea4/5df/c3 -d '{"2aszx45ddfaf":"qzxsdfw21eq"}'

curl web:9200/_cluster/health?pretty
{
  "cluster_name" : "prod_es_cluster",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 20,
  "active_shards" : 20,
  "relocating_shards" : 0,
  "initializing_shards" : 4,
  "unassigned_shards" : 16,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 50.0
}

Node1 logs:

[2017-11-23T01:35:16,817][INFO ][o.e.n.Node               ] [prod_es_node1] initializing ...
[2017-11-23T01:35:16,892][INFO ][o.e.e.NodeEnvironment    ] [prod_es_node1] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [14.1gb], net total_space [19.9gb], spins? [unknown], types [rootfs]
[2017-11-23T01:35:16,893][INFO ][o.e.e.NodeEnvironment    ] [prod_es_node1] heap size [1.9gb], compressed ordinary object pointers [true]
[2017-11-23T01:35:16,895][INFO ][o.e.n.Node               ] [prod_es_node1] node name [prod_es_node1], node ID [omRpm_KvSfaqhzNDM_ogwA]
[2017-11-23T01:35:16,895][INFO ][o.e.n.Node               ] [prod_es_node1] version[5.6.4], pid[1], build[8bbedf5/2017-10-31T18:55:38.105Z], OS[Linux/3.10.0-327.el7.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_151/25.151-b12]
[2017-11-23T01:35:16,895][INFO ][o.e.n.Node               ] [prod_es_node1] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Xms2g, -Xmx2g, -Des.path.home=/usr/share/elasticsearch]
[2017-11-23T01:35:17,676][INFO ][o.e.p.PluginsService     ] [prod_es_node1] loaded module [aggs-matrix-stats]
[2017-11-23T01:35:17,676][INFO ][o.e.p.PluginsService     ] [prod_es_node1] loaded module [ingest-common]
[2017-11-23T01:35:17,677][INFO ][o.e.p.PluginsService     ] [prod_es_node1] loaded module [lang-expression]
[2017-11-23T01:35:17,677][INFO ][o.e.p.PluginsService     ] [prod_es_node1] loaded module [lang-groovy]
[2017-11-23T01:35:17,677][INFO ][o.e.p.PluginsService     ] [prod_es_node1] loaded module [lang-mustache]
[2017-11-23T01:35:17,677][INFO ][o.e.p.PluginsService     ] [prod_es_node1] loaded module [lang-painless]
[2017-11-23T01:35:17,677][INFO ][o.e.p.PluginsService     ] [prod_es_node1] loaded module [parent-join]
[2017-11-23T01:35:17,677][INFO ][o.e.p.PluginsService     ] [prod_es_node1] loaded module [percolator]
[2017-11-23T01:35:17,677][INFO ][o.e.p.PluginsService     ] [prod_es_node1] loaded module [reindex]
[2017-11-23T01:35:17,677][INFO ][o.e.p.PluginsService     ] [prod_es_node1] loaded module [transport-netty3]
[2017-11-23T01:35:17,677][INFO ][o.e.p.PluginsService     ] [prod_es_node1] loaded module [transport-netty4]
[2017-11-23T01:35:17,677][INFO ][o.e.p.PluginsService     ] [prod_es_node1] no plugins loaded
[2017-11-23T01:35:18,918][INFO ][o.e.d.DiscoveryModule    ] [prod_es_node1] using discovery type [zen]
[2017-11-23T01:35:19,395][INFO ][o.e.n.Node               ] [prod_es_node1] initialized
[2017-11-23T01:35:19,395][INFO ][o.e.n.Node               ] [prod_es_node1] starting ...
[2017-11-23T01:35:19,595][INFO ][o.e.t.TransportService   ] [prod_es_node1] publish_address {192.168.2.11:9300}, bound_addresses {192.168.2.11:9300}
[2017-11-23T01:35:19,613][INFO ][o.e.b.BootstrapChecks    ] [prod_es_node1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-11-23T01:35:49,637][WARN ][o.e.n.Node               ] [prod_es_node1] timed out while waiting for initial discovery state - timeout: 30s
[2017-11-23T01:35:49,647][INFO ][o.e.h.n.Netty4HttpServerTransport] [prod_es_node1] publish_address {192.168.2.11:9200}, bound_addresses {0.0.0.0:9200}
[2017-11-23T01:35:49,647][INFO ][o.e.n.Node               ] [prod_es_node1] started
[2017-11-23T01:35:52,541][INFO ][o.e.c.s.ClusterService   ] [prod_es_node1] detected_master {prod_es_node2}{RGV3OxddRWSO93QOo9pK9w}{rvLO8MoQR_6XK65hK_mI6Q}{192.168.2.12}{192.168.2.12:9300}, added {{prod_es_node2}{RGV3OxddRWSO93QOo9pK9w}{rvLO8MoQR_6XK65hK_mI6Q}{192.168.2.12}{192.168.2.12:9300},}, reason: zen-disco-receive(from master [master {prod_es_node2}{RGV3OxddRWSO93QOo9pK9w}{rvLO8MoQR_6XK65hK_mI6Q}{192.168.2.12}{192.168.2.12:9300} committed version [1]])

Node2 logs:

[2017-11-23T01:35:19,677][INFO ][o.e.n.Node               ] [prod_es_node2] initializing ...
[2017-11-23T01:35:19,752][INFO ][o.e.e.NodeEnvironment    ] [prod_es_node2] using [1] data paths, mounts [[/ (rootfs)]], net usable_spac
e [36.4gb], net total_space [39.9gb], spins? [unknown], types [rootfs]
[2017-11-23T01:35:19,752][INFO ][o.e.e.NodeEnvironment    ] [prod_es_node2] heap size [1.9gb], compressed ordinary object pointers [true
]
[2017-11-23T01:35:19,753][INFO ][o.e.n.Node               ] [prod_es_node2] node name [prod_es_node2], node ID [RGV3OxddRWSO93QOo9pK9w]
[2017-11-23T01:35:19,753][INFO ][o.e.n.Node               ] [prod_es_node2] version[5.6.4], pid[1], build[8bbedf5/2017-10-31T18:55:38.10
5Z], OS[Linux/3.10.0-327.4.4.el7.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_151/25.151-b12]
[2017-11-23T01:35:19,753][INFO ][o.e.n.Node               ] [prod_es_node2] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:
CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.enc
oding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true
, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:
+HeapDumpOnOutOfMemoryError, -Xms2g, -Xmx2g, -Des.path.home=/usr/share/elasticsearch]
[2017-11-23T01:35:20,543][INFO ][o.e.p.PluginsService     ] [prod_es_node2] loaded module [aggs-matrix-stats]
[2017-11-23T01:35:20,543][INFO ][o.e.p.PluginsService     ] [prod_es_node2] loaded module [ingest-common]
[2017-11-23T01:35:20,543][INFO ][o.e.p.PluginsService     ] [prod_es_node2] loaded module [lang-expression]
[2017-11-23T01:35:20,543][INFO ][o.e.p.PluginsService     ] [prod_es_node2] loaded module [lang-groovy]
[2017-11-23T01:35:20,543][INFO ][o.e.p.PluginsService     ] [prod_es_node2] loaded module [lang-mustache]
[2017-11-23T01:35:20,543][INFO ][o.e.p.PluginsService     ] [prod_es_node2] loaded module [lang-painless]
[2017-11-23T01:35:20,543][INFO ][o.e.p.PluginsService     ] [prod_es_node2] loaded module [parent-join]
[2017-11-23T01:35:20,543][INFO ][o.e.p.PluginsService     ] [prod_es_node2] loaded module [percolator]
[2017-11-23T01:35:20,543][INFO ][o.e.p.PluginsService     ] [prod_es_node2] loaded module [reindex]
[2017-11-23T01:35:20,543][INFO ][o.e.p.PluginsService     ] [prod_es_node2] loaded module [transport-netty3]
[2017-11-23T01:35:20,544][INFO ][o.e.p.PluginsService     ] [prod_es_node2] loaded module [transport-netty4]
[2017-11-23T01:35:20,544][INFO ][o.e.p.PluginsService     ] [prod_es_node2] no plugins loaded
[2017-11-23T01:35:21,777][INFO ][o.e.d.DiscoveryModule    ] [prod_es_node2] using discovery type [zen]
[2017-11-23T01:35:22,254][INFO ][o.e.n.Node               ] [prod_es_node2] initialized
[2017-11-23T01:35:22,254][INFO ][o.e.n.Node               ] [prod_es_node2] starting ...
[2017-11-23T01:35:22,447][INFO ][o.e.t.TransportService   ] [prod_es_node2] publish_address {192.168.2.12:9300}, bound_addresses {192.16
8.2.12:9300}
[2017-11-23T01:35:22,459][INFO ][o.e.b.BootstrapChecks    ] [prod_es_node2] bound or publishing to a non-loopback or non-link-local addr
ess, enforcing bootstrap checks
[2017-11-23T01:35:52,487][WARN ][o.e.n.Node               ] [prod_es_node2] timed out while waiting for initial discovery state - timeou
t: 30s
[2017-11-23T01:35:52,497][INFO ][o.e.h.n.Netty4HttpServerTransport] [prod_es_node2] publish_address {192.168.2.12:9200}, bound_addresses
 {0.0.0.0:9200}
[2017-11-23T01:35:52,497][INFO ][o.e.n.Node               ] [prod_es_node2] started
[2017-11-23T01:35:52,510][INFO ][o.e.c.s.ClusterService   ] [prod_es_node2] new_master {prod_es_node2}{RGV3OxddRWSO93QOo9pK9w}{rvLO8MoQR_6XK65hK_mI6Q}{192.168.2.12}{192.168.2.12:9300}, added {{prod_es_node1}{omRpm_KvSfaqhzNDM_ogwA}{_pBeDeVWR2iorTsUBZyB-w}{192.168.2.11}{192.168.2.11:9300},}, reason: zen-disco-elected-as-master ([1] nodes joined)[{prod_es_node1}{omRpm_KvSfaqhzNDM_ogwA}{_pBeDeVWR2iorTsUBZyB-w}{192.168.2.11}{192.168.2.11:9300}]
[2017-11-23T01:35:52,556][WARN ][o.e.d.z.ElectMasterService] [prod_es_node2] value for setting "discovery.zen.minimum_master_nodes" is too low. This can result in data loss! Please set it to at least a quorum of master-eligible nodes (current value: [1], total number of master-eligible nodes used for publishing in this round: [2])
[2017-11-23T01:35:52,595][INFO ][o.e.g.GatewayService     ] [prod_es_node2] recovered [0] indices into cluster_state
[2017-11-23T01:37:44,840][WARN ][o.e.d.r.RestController   ] Content type detection for rest requests is deprecated. Specify the content
type using the [Content-Type] header.
[2017-11-23T01:37:44,923][INFO ][o.e.c.m.MetaDataCreateIndexService] [prod_es_node2] [23w1] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2017-11-23T01:37:45,293][INFO ][o.e.c.m.MetaDataMappingService] [prod_es_node2] [23w1/-ZzS4rqcQGuOBa0MruBX2g] create_mapping [332]
[2017-11-23T01:37:45,451][WARN ][o.e.d.r.RestController   ] Content type detection for rest requests is deprecated. Specify the content
type using the [Content-Type] header.
[2017-11-23T01:37:45,455][INFO ][o.e.c.m.MetaDataCreateIndexService] [prod_es_node2] [rwer] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2017-11-23T01:37:45,547][INFO ][o.e.c.m.MetaDataMappingService] [prod_es_node2] [rwer/1_IE3HLDSjOJANhwhEWwqA] create_mapping [2e2]
[2017-11-23T01:37:45,624][WARN ][o.e.d.r.RestController   ] Content type detection for rest requests is deprecated. Specify the content
type using the [Content-Type] header.
[2017-11-23T01:37:45,629][INFO ][o.e.c.m.MetaDataCreateIndexService] [prod_es_node2] [2t44] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2017-11-23T01:37:45,707][INFO ][o.e.c.m.MetaDataMappingService] [prod_es_node2] [2t44/60i2Sf_qQg-oEDcgyZ5v4A] create_mapping [55d]
[2017-11-23T01:37:46,899][WARN ][o.e.d.r.RestController   ] Content type detection for rest requests is deprecated. Specify the content
type using the [Content-Type] header.
[2017-11-23T01:37:46,906][INFO ][o.e.c.m.MetaDataCreateIndexService] [prod_es_node2] [2ea4] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2017-11-23T01:37:46,996][INFO ][o.e.c.m.MetaDataMappingService] [prod_es_node2] [2ea4/Z8Tt63PVSBGaxCwJfZzw2Q] create_mapping [5df]

and here stuck for a while

about 30 minutes I got more logs:

Node1

[2017-11-23T02:01:23,383][INFO ][o.e.d.z.ZenDiscovery     ] [prod_es_node1] master_left [{prod_es_node2}{RGV3OxddRWSO93QOo9pK9w}{rvLO8MoQR_6XK65hK_mI6Q}{192.168.2.12}{192.168.2.12:9300}], reason [failed to ping, tried [3] times, each with  maximum [30s] timeout]
[2017-11-23T02:01:23,385][WARN ][o.e.d.z.ZenDiscovery     ] [prod_es_node1] master left (reason = failed to ping, tried [3] times, each
with  maximum [30s] timeout), current nodes: nodes:
   {prod_es_node1}{omRpm_KvSfaqhzNDM_ogwA}{_pBeDeVWR2iorTsUBZyB-w}{192.168.2.11}{192.168.2.11:9300}, local
   {prod_es_node2}{RGV3OxddRWSO93QOo9pK9w}{rvLO8MoQR_6XK65hK_mI6Q}{192.168.2.12}{192.168.2.12:9300}, master

[2017-11-23T02:02:53,390][INFO ][o.e.d.z.ZenDiscovery     ] [prod_es_node1] failed to send join request to master [{prod_es_node2}{RGV3OxddRWSO93QOo9pK9w}{rvLO8MoQR_6XK65hK_mI6Q}{192.168.2.12}{192.168.2.12:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]
[2017-11-23T02:04:23,395][INFO ][o.e.d.z.ZenDiscovery     ] [prod_es_node1] failed to send join request to master [{prod_es_node2}{RGV3OxddRWSO93QOo9pK9w}{rvLO8MoQR_6XK65hK_mI6Q}{192.168.2.12}{192.168.2.12:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]
[2017-11-23T02:05:00,898][WARN ][o.e.d.z.UnicastZenPing   ] [prod_es_node1] failed to send ping to [{prod_es_node2}{RGV3OxddRWSO93QOo9pK9w}{rvLO8MoQR_6XK65hK_mI6Q}{192.168.2.12}{192.168.2.12:9300}]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [prod_es_node2][192.168.2.12:9300][internal:discovery/zen/unicast] request_id [1584] timed out after [37500ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:961) [elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.4.jar:5.6.4]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2017-11-23T02:05:10,898][WARN ][o.e.d.z.UnicastZenPing   ] [prod_es_node1] failed to send ping to [{prod_es_node2}{RGV3OxddRWSO93QOo9pK9w}{rvLO8MoQR_6XK65hK_mI6Q}{192.168.2.12}{192.168.2.12:9300}]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [prod_es_node2][192.168.2.12:9300][internal:discovery/zen/unicast] request_id [1587] timed out after [37501ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:961) [elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.4.jar:5.6.4]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2017-11-23T02:05:53,398][INFO ][o.e.d.z.ZenDiscovery     ] [prod_es_node1] failed to send join request to master [{prod_es_node2}{RGV3OxddRWSO93QOo9pK9w}{rvLO8MoQR_6XK65hK_mI6Q}{192.168.2.12}{192.168.2.12:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]

Node2

[2017-11-23T02:01:22,723][INFO ][o.e.c.r.a.AllocationService] [prod_es_node2] Cluster health status changed from [YELLOW] to [RED] (reas
on: [{prod_es_node1}{omRpm_KvSfaqhzNDM_ogwA}{_pBeDeVWR2iorTsUBZyB-w}{192.168.2.11}{192.168.2.11:9300} transport disconnected]).
[2017-11-23T02:01:22,723][INFO ][o.e.c.s.ClusterService   ] [prod_es_node2] removed {{prod_es_node1}{omRpm_KvSfaqhzNDM_ogwA}{_pBeDeVWR2i
orTsUBZyB-w}{192.168.2.11}{192.168.2.11:9300},}, reason: zen-disco-node-failed({prod_es_node1}{omRpm_KvSfaqhzNDM_ogwA}{_pBeDeVWR2iorTsUB
ZyB-w}{192.168.2.11}{192.168.2.11:9300}), reason(transport disconnected)[{prod_es_node1}{omRpm_KvSfaqhzNDM_ogwA}{_pBeDeVWR2iorTsUBZyB-w}
{192.168.2.11}{192.168.2.11:9300} transport disconnected]
[2017-11-23T02:01:22,734][INFO ][o.e.c.r.DelayedAllocationService] [prod_es_node2] scheduling reroute for delayed shards in [59.9s] (12d
elayed shards)
[2017-11-23T02:02:53,403][WARN ][o.e.d.z.ZenDiscovery     ] [prod_es_node2] failed to validate incoming join request from node [{prod_es
_node1}{omRpm_KvSfaqhzNDM_ogwA}{_pBeDeVWR2iorTsUBZyB-w}{192.168.2.11}{192.168.2.11:9300}]
org.elasticsearch.ElasticsearchTimeoutException: java.util.concurrent.TimeoutException: Timeout waiting for task.
        at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:63) ~[elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:33) ~[elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.discovery.zen.MembershipAction.sendValidateJoinRequestBlocking(MembershipAction.java:104) ~[elasticsearch-5
.6.4.jar:5.6.4]
        at org.elasticsearch.discovery.zen.ZenDiscovery.handleJoinRequest(ZenDiscovery.java:857) [elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.discovery.zen.ZenDiscovery$MembershipListener.onJoin(ZenDiscovery.java:1038) [elasticsearch-5.6.4.jar:5.6.4
]
        at org.elasticsearch.discovery.zen.MembershipAction$JoinRequestRequestHandler.messageReceived(MembershipAction.java:136) [elasti
csearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.discovery.zen.MembershipAction$JoinRequestRequestHandler.messageReceived(MembershipAction.java:132) [elasti
csearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) [elasticsearch-5.6.4.jar
:5.6.4]
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) [elasticsearch-5.6.
4.jar:5.6.4]
        at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1553) [elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elast
icsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.6.4.jar:5.6.4]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
Caused by: java.util.concurrent.TimeoutException: Timeout waiting for task.
        at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:232) ~[elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:67) ~[elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:61) ~[elasticsearch-5.6.4.jar:5.6.4]
        ... 14 more
[2017-11-23T02:05:53,400][WARN ][o.e.d.z.ZenDiscovery     ] [prod_es_node2] failed to validate incoming join request from node [{prod_es_node1}{omRpm_KvSfaqhzNDM_ogwA}{_pBeDeVWR2iorTsUBZyB-w}{192.168.2.11}{192.168.2.11:9300}]
org.elasticsearch.ElasticsearchTimeoutException: java.util.concurrent.TimeoutException: Timeout waiting for task.
        at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:63) ~[elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:33) ~[elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.discovery.zen.MembershipAction.sendValidateJoinRequestBlocking(MembershipAction.java:104) ~[elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.discovery.zen.ZenDiscovery.handleJoinRequest(ZenDiscovery.java:857) [elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.discovery.zen.ZenDiscovery$MembershipListener.onJoin(ZenDiscovery.java:1038) [elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.discovery.zen.MembershipAction$JoinRequestRequestHandler.messageReceived(MembershipAction.java:136) [elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.discovery.zen.MembershipAction$JoinRequestRequestHandler.messageReceived(MembershipAction.java:132) [elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) [elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) [elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1553) [elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.6.4.jar:5.6.4]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
Caused by: java.util.concurrent.TimeoutException: Timeout waiting for task.
        at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:232) ~[elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:67) ~[elasticsearch-5.6.4.jar:5.6.4]
        at org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:61) ~[elasticsearch-5.6.4.jar:5.6.4]
        ... 14 more

I have tried to build two docker container in same physical server.
And use the same config. The elasticsearch cluster working fine.
No timeout and disconnections.

Ok. So you have something in the middle which creates some kind of disconnection?
Are you on the same VLAN/LAN?

About this:

[2017-11-23T01:35:52,556][WARN ][o.e.d.z.ElectMasterService] [prod_es_node2] value for setting "discovery.zen.minimum_master_nodes" is too low. This can result in data loss! Please set it to at least a quorum of master-eligible nodes (current value: [1], total number of master-eligible nodes used for publishing in this round: [2])

When everything will be working, I'd encourage having 3 master nodes and set discovery.zen.minimum_master_nodes to 2.

About

[2017-11-23T01:37:44,840][WARN ][o.e.d.r.RestController   ] Content type detection for rest requests is deprecated. Specify the content
type using the [Content-Type] header.

Just change:

curl -XPUT web:9200/23w1/332/33 -d '{"as1213ewq5df":"q3wqqweweeq"}'

with

curl -XPUT web:9200/23w1/332/33 -H "Content-Type: application/json"  -d '{"as1213ewq5df":"q3wqqweweeq"}' 

Yes, servers on the same LAN.
Create VLAN with pipework and ovs-vsctl command.
No firewall setting between two serve.

Thanks for the information

Does your container still live when you see disconnections? Any memory error or long GC happening ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.