ES upgrade 1.6.2 to 2.1.0 doesn't bind to desired address

geosword · January 22, 2016, 2:27pm

Hi Everyone,
we're currently running elasticsearch 1.7.3 and wanting to upgrade to 2.1.1
The problem Im having is that 2.x seems to have changed the default behaviour for bind to ips. As such, elasticsearch is ONLY binding to one address, and not all of the addresses on a host.

This is our current config (or lack there of):
# grep "network." /etc/elasticsearch/elasticsearch.yml
#network.bind_host: 192.168.0.1
#network.publish_host: 192.168.0.1
#network.host: 192.168.0.1

In 1.7.3 this gives us the default and desired behaviour:

# netstat -tulpn | grep java
tcp6       0      0 :::9300                 :::*                    LISTEN      13275/java
tcp6       0      0 :::9200                 :::*                    LISTEN      13275/java

We have this network set up on each node, with each node using 61.62.63 respectively:

 ~ # ip -4 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
       valid_lft forever preferred_lft forever
    inet 192.168.0.60/32 scope global lo
       valid_lft forever preferred_lft forever
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    inet 192.168.0.61/24 brd 192.168.0.255 scope global bond0
       valid_lft forever preferred_lft forever

.60 is the service ip (we use keepalived to make sure requests from kibana go to which ever node is still up... assuming a still functioning cluster.) Keepalived is on a completely separate host.

Once we Upgrade to 2.1.1 using this guide https://www.elastic.co/guide/en/elasticsearch/reference/current/restart-upgrade.html
It ONLY binds to 192.168.0.60

As such, the nodes dont communicate with other, and we get this:

curl -s http://192.168.0.60:9200/_cluster/health?pretty
{
  "error" : {
    "root_cause" : [ {
      "type" : "master_not_discovered_exception",
      "reason" : "waited for [30s]"
    } ],
    "type" : "master_not_discovered_exception",
    "reason" : "waited for [30s]"
  },
  "status" : 503
}

on all nodes.

What I have tried.

network.bind_host = [ "192.168.0.6x",  "192.168.0.60", "127.0.0.1" ]
network.publish_host = "192.168.0.6x"

Where x is the correct byte for the node in question.

And combinations of the above, however we always get the same result. (master_not_discovered)

All of the nodes are on the same subnet, and there are no firewalls to get in the way. I even verified that the nodes can connect to the relevant ports by telnetting them.

The logs show this:

[2016-01-22 11:20:46,003][DEBUG][discovery.zen            ] [node3] filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2016-01-22 11:20:53,603][DEBUG][discovery.zen            ] [node3] filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2016-01-22 11:21:01,123][DEBUG][discovery.zen            ] [node3] filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2016-01-22 11:21:08,448][WARN ][discovery                ] [node3] waited for 30s and no initial state was set by the discovery
[2016-01-22 11:21:08,453][DEBUG][cluster.service          ] [node3] processing [gateway_initial_state_recovery]: execute
[2016-01-22 11:21:08,453][DEBUG][cluster.service          ] [node3] processing [gateway_initial_state_recovery]: took 0s no change in cluster_state
[2016-01-22 11:21:08,470][DEBUG][http.netty               ] [node3] Bound http to address {192.168.0.60:9200}
[2016-01-22 11:21:08,472][DEBUG][http.netty               ] [node3] Bound http to address {127.0.0.1:9200}
.....
[2016-01-22 11:22:56,566][WARN ][discovery.zen.ping.unicast] [node3] failed to send ping to [{node3}{FhHZuCAkQEOikQO48eIWhg}{192.168.156.63}{192.168.0.63:9300}{master=true}]

Which seems to imply it cant talk to itself...

Ideally, Id like to recreate the current 1.7.3 behaviour, but listening on 127.0.0.1 is not required. Listening on the nodes eth0 ip AND the service ip (on loopback) is a must however.

Any pointers greatly appreciated.
Thanks in advance.

Dylan

magnusbaeck · January 23, 2016, 5:19pm

Setting network.bind.host to 0.0.0.0 should make it listen on all interfaces.

Topic		Replies	Views
Upgrading to ES 2.2.0 doesn't allow node to have different bind and publish addresses Elasticsearch	13	3826	July 5, 2017
ES 2.0 network config options Elasticsearch	8	1896	July 5, 2017
ES 2.1 only bind on localhost Elasticsearch	11	4036	July 5, 2017
How Can I configure for another ES cluster? Elasticsearch	9	1392	July 5, 2017
What to put in network.host for multiple address's Elasticsearch	2	4526	May 21, 2017

ES upgrade 1.6.2 to 2.1.0 doesn't bind to desired address

Related topics