ES 5.5.0 will not start up with gce discovery


(Steph Van Schalkwyk) #1

My elasticsearch.yml looks like this:

cluster.name: gce-cluster
discovery.zen.ping.unicast.hosts:
- 104.198.44.16
- 35.184.239.42
http.port: 9200
network.host: _gce_
node.data: false
node.master: true
transport.tcp.port: 9300
node.name: 104.198.44.16-node1
cloud.gce.project_id: gc001-001
cloud.gce.zone: us-central1-a
discovery.type: gce
#################################### Paths ####################################
# Path to directory containing configuration (this file and logging.yml):
path.conf: /etc/elasticsearch/node1
path.data: /var/lib/elasticsearch/104.198.44.16-node1
path.logs: /var/log/elasticsearch/104.198.44.16-node1

(Mark Walkom) #2

What's in the logs?


(David Pilato) #3

And BTW this

discovery.zen.ping.unicast.hosts

Is useless in this context as GCE Plugin should set it instead.


(Steph Van Schalkwyk) #4

Hi. Doesn't get to the log part. I think I may have the issue, will post later tonight.


(Steph Van Schalkwyk) #5

Thanks, will try that now.


(Steph Van Schalkwyk) #6

Cannot start. My elasticsearch.yml looks like this:

bootstrap.memory_lock: false
cluster.name: gce-cluster
http.port: 9200
network.host: _gce_
node.data: false
node.master: true
transport.tcp.port: 9300
node.name: 104.198.35.192-node1
cloud.gce.project_id: gc001-001
cloud.gce.zone: us-central1-a
discovery.type: gce
path.conf: /etc/elasticsearch/node1
path.data: /var/lib/elasticsearch/104.198.35.192-node1
path.logs: /var/log/elasticsearch/104.198.35.192-node1

I have 1.5g as the es heap size.


(Mark Walkom) #7

What's in the logs?


(Steph Van Schalkwyk) #8

Update: this was without ``Compute RW```

Which account is gce-discovery running under?

[2017-07-26T03:52:42,888][INFO ][o.e.c.g.GceInstancesServiceImpl] [104.198.35.192-node1] starting GCE discovery
 service
[2017-07-26T03:52:43,409][WARN ][o.e.c.g.GceInstancesServiceImpl] [104.198.35.192-node1] Problem fetching insta
nce list for zone us-central1-a
java.security.PrivilegedActionException: null
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131]
        at org.elasticsearch.cloud.gce.GceInstancesServiceImpl.lambda$instances$1(GceInstancesServiceImpl.java:
75) ~[?:?]
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) [?:1.8.0_131]
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) [?:1.8.0_131]
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) [?:1.8.0_131]
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) [?:1.8.0_131]
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) [?:1.8.0_131]
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) [?:1.8.0_131]
        at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:474) [?:1.8.0_131]
        at org.elasticsearch.cloud.gce.GceInstancesServiceImpl.instances(GceInstancesServiceImpl.java:91) [disc
overy-gce-5.5.0.jar:5.5.0]
        at org.elasticsearch.discovery.gce.GceUnicastHostsProvider.buildDynamicNodes(GceUnicastHostsProvider.ja
va:134) [discovery-gce-5.5.0.jar:5.5.0]
        at org.elasticsearch.discovery.zen.UnicastZenPing.ping(UnicastZenPing.java:309) [elasticsearch-5.5.0.ja
r:5.5.0]
        at org.elasticsearch.discovery.zen.UnicastZenPing.ping(UnicastZenPing.java:286) [elasticsearch-5.5.0.ja
r:5.5.0]
        at org.elasticsearch.discovery.zen.ZenDiscovery.pingAndWait(ZenDiscovery.java:1010) [elasticsearch-5.5.
0.jar:5.5.0]
        at org.elasticsearch.discovery.zen.ZenDiscovery.findMaster(ZenDiscovery.java:869) [elasticsearch-5.5.0.
jar:5.5.0]
        at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:378) [elasticsearch-
5.5.0.jar:5.5.0]
        at org.elasticsearch.discovery.zen.ZenDiscovery.access$4100(ZenDiscovery.java:83) [elasticsearch-5.5.0.
jar:5.5.0]
        at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1188) [elasti
csearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.j
ava:569) [elasticsearch-5.5.0.jar:5.5.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
  "code" : 403,
  "errors" : [ {
    "domain" : "global",
    "message" : "Insufficient Permission",
    "reason" : "insufficientPermissions"
  } ],
  "message" : "Insufficient Permission"
}
        at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.j
ava:145) ~[?:?]
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(A
bstractGoogleJsonClientRequest.java:113) ~[?:?]
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(A
bstractGoogleJsonClientRequest.java:40) ~[?:?]
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGo
ogleClientRequest.java:321) ~[?:?]
        at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1056) ~[?:?]
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogle
ClientRequest.java:419) ~[?:?]
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogle
ClientRequest.java:352) ~[?:?]
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRe
quest.java:469) ~[?:?]
        at org.elasticsearch.cloud.gce.GceInstancesServiceImpl$1.run(GceInstancesServiceImpl.java:79) ~[?:?]
        at org.elasticsearch.cloud.gce.GceInstancesServiceImpl$1.run(GceInstancesServiceImpl.java:75) ~[?:?]
        ... 22 more
[2017-07-26T03:52:43,425][WARN ][o.e.c.g.GceInstancesServiceImpl] [104.198.35.192-node1] disabling GCE discover
y. Can not get list of nodes

(Mark Walkom) #9

Did you set it all up as per the chapters of https://www.elastic.co/guide/en/elasticsearch/plugins/5.5/discovery-gce.html


(Steph Van Schalkwyk) #10

Yes. Compute Engine Read Write


bootstrap.memory_lock: false
cluster.name: gce-cluster
http.port: 9200
network.host: _gce_
node.data: false
node.master: true
transport.tcp.port: 9300
node.name: 35.184.239.42-node1
cloud.gce.project_id: gc001-001
cloud.gce.zone: us-central1-a
discovery.type: gce
# Path to directory containing configuration (this file and logging.yml):
path.conf: /etc/elasticsearch/node1
path.data: /var/lib/elasticsearch/35.184.239.42-node1
path.logs: /var/log/elasticsearch/35.184.239.42-node1

plugin is installed:

ls /usr/share/elasticsearch/plugins/
discovery-gce

Times out waiting for startup. NO LOGS.


(Mark Walkom) #11

What about https://www.elastic.co/guide/en/elasticsearch/plugins/5.5/discovery-gce-usage-long.html, because it links to https://www.elastic.co/guide/en/elasticsearch/plugins/5.5/discovery-gce-usage-tips.html#discovery-gce-usage-tips-permissions which seems to be what you are experiencing (from what I can see).


(S) #12

Thanks. I've done all of that. Es runs fine without the gce parts. Problem is no logs are being produced, which is rather peculiar.


(David Pilato) #13

But can you share your logs even if you don't see anything in them?


(S) #14

There are no log files


(David Pilato) #15

That would mean that your elasticsearch.yml file is totally broken.

What are the logs without gce plugin? Can you share them?


(Steph Van Schalkwyk) #16

Seems to be the "network.host" setting. ES cannot bind to any of the (underscore)site(underscore)", network interface, (underscore)gce(underscore) etc. bindings, so startup fails. network.host: "(underscore)local(underscore)" (the default) works and ES starts up.


(Steph Van Schalkwyk) #17

This is my environment:
Using GCE.
Using Ansible 2.4.0 to provision instances in GCE.

OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.17.04.2-b11)
es_version: "5.5.0"
es_heap_size: "2g"
Plugins: discovery-gce```

elasticsearch.yml :

cluster.name: gce-cluster
http.port: 9200
network.host: _gce_
node.data: true
node.master: true
transport.tcp.port: 9300
node.name: 35.188.41.199-node1
cloud.gce.project_id: gc001-001
cloud.gce.zone: us-central1-a
discovery.type: gce
path.conf: /etc/elasticsearch/node1
path.data: /var/lib/elasticsearch/35.188.41.199-node1
path.logs: /var/log/elasticsearch/35.188.41.199-node1

Observed behaviour:
ES will not restart after installation of discovery-gce.

Expected behaviour:
ES starts up.

(David Pilato) #18

Please format your code using </> icon as explained in this guide. It will make your post more readable.

Or use markdown style like:

```
CODE
```

I'm pretty sure that if it fails when trying to define the network card to listen on, then you have logs.
Please check again your elasticsearch logs and share them.


(Steph Van Schalkwyk) #19

I went back to ES 5.2.2 and ES is starting up with the gce-discovery plug in.
You are correct in that it fails to bind to a transport.
Looking at my FW rules now.
This from /var/log/elasticsearch/104.155.150.140-node1/gce-cluster.log :
[2017-07-26T17:21:15,397][INFO ][o.e.c.g.GceInstancesServiceImpl] [node1] starting GCE discovery service [2017-07-26T17:21:19,050][INFO ][o.e.c.g.GceInstancesServiceImpl] [node1] starting GCE discovery service [2017-07-26T17:21:45,375][WARN ][o.e.n.Node ] [node1] timed out while waiting for initial discovery state - timeout: 30s [2017-07-26T17:21:45,385][INFO ][o.e.h.HttpServer ] [node1] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200} [2017-07-26T17:21:45,385][INFO ][o.e.n.Node ] [node1] started [2017-07-26T17:27:28,976][INFO ][o.e.n.Node ] [node1] stopping ... [2017-07-26T17:27:29,059][INFO ][o.e.n.Node ] [node1] stopped [2017-07-26T17:27:29,059][INFO ][o.e.n.Node ] [node1] closing ... [2017-07-26T17:27:29,117][INFO ][o.e.n.Node ] [node1] closed [2017-07-26T17:28:08,874][INFO ][o.e.n.Node ] [node1] initializing ... [2017-07-26T17:28:09,053][INFO ][o.e.e.NodeEnvironment ] [node1] using [1] data paths, mounts [[/ (/dev/sda1)]], net usable_space [8gb], net total_space [9.5g b], spins? [possibly], types [ext4] [2017-07-26T17:28:09,053][INFO ][o.e.e.NodeEnvironment ] [node1] heap size [1007.3mb], compressed ordinary object pointers [true] [2017-07-26T17:28:09,055][INFO ][o.e.n.Node ] [node1] node name [node1], node ID [LhgPfp8-QXq2JCPgJBmLtQ] [2017-07-26T17:28:09,060][INFO ][o.e.n.Node ] [node1] version[5.2.2], pid[1631], build[f9d9b74/2017-02-24T17:26:45.835Z], OS[Linux/4.10.0-28-generi c/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_131/25.131-b11] [2017-07-26T17:28:11,458][INFO ][o.e.p.PluginsService ] [node1] loaded module [aggs-matrix-stats] [2017-07-26T17:28:11,459][INFO ][o.e.p.PluginsService ] [node1] loaded module [ingest-common] [2017-07-26T17:28:11,459][INFO ][o.e.p.PluginsService ] [node1] loaded module [lang-expression] [2017-07-26T17:28:11,460][INFO ][o.e.p.PluginsService ] [node1] loaded module [lang-groovy] [2017-07-26T17:28:11,460][INFO ][o.e.p.PluginsService ] [node1] loaded module [lang-mustache] [2017-07-26T17:28:11,460][INFO ][o.e.p.PluginsService ] [node1] loaded module [lang-painless] [2017-07-26T17:28:11,460][INFO ][o.e.p.PluginsService ] [node1] loaded module [percolator] [2017-07-26T17:28:11,460][INFO ][o.e.p.PluginsService ] [node1] loaded module [reindex] [2017-07-26T17:28:11,460][INFO ][o.e.p.PluginsService ] [node1] loaded module [transport-netty3] [2017-07-26T17:28:11,460][INFO ][o.e.p.PluginsService ] [node1] loaded module [transport-netty4] [2017-07-26T17:28:11,461][INFO ][o.e.p.PluginsService ] [node1] loaded plugin [discovery-gce] [2017-07-26T17:28:15,556][INFO ][o.e.n.Node ] [node1] initialized [2017-07-26T17:28:15,556][INFO ][o.e.n.Node ] [node1] starting ... [2017-07-26T17:28:16,226][WARN ][i.n.u.i.MacAddressUtil ] Failed to find a usable hardware address from the network interfaces; using random bytes: 4e:d4:eb:42 :cd:ff:32:72 [2017-07-26T17:28:16,345][INFO ][o.e.t.TransportService ] [node1] publish_address {10.128.0.3:9300}, bound_addresses {10.128.0.3:9300} [2017-07-26T17:28:16,351][INFO ][o.e.b.BootstrapChecks ] [node1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks [2017-07-26T17:28:16,453][INFO ][o.e.c.g.GceInstancesServiceImpl] [node1] starting GCE discovery service [2017-07-26T17:28:20,330][INFO ][o.e.c.g.GceInstancesServiceImpl] [node1] starting GCE discovery service [2017-07-26T17:28:46,394][WARN ][o.e.n.Node ] [node1] timed out while waiting for initial discovery state - timeout: 30s [2017-07-26T17:28:46,453][INFO ][o.e.h.HttpServer ] [node1] publish_address {10.128.0.3:9200}, bound_addresses {10.128.0.3:9200} [2017-07-26T17:28:46,453][INFO ][o.e.n.Node ] [node1] started


(Steph Van Schalkwyk) #20

I have these ports open in the FW.
NAME NETWORK SRC_RANGES RULES SRC_TAGS TARGET_TAGS default-allow-icmp default 0.0.0.0/0 icmp default-allow-internal default 10.128.0.0/9 tcp:0-65535,udp:0-65535,icmp default-allow-rdp default 0.0.0.0/0 tcp:3389 default-allow-ssh default 0.0.0.0/0 tcp:22 rule-9200 default 10.128.0.0/9 tcp:9200 rule-9300 default 10.128.0.0/9 tcp:9300 rule-9300-udp default 10.128.0.0/9 udp:9300
And I still get this from /var/log/elasticsearch/104.155.150.140-node1/gce-cluster.log :

[2017-07-26T17:49:33,240][WARN ][i.n.u.i.MacAddressUtil ] Failed to find a usable hardware address from the network interfaces; using random bytes: ec:51:31:d0 :5a:f9:22:ad [2017-07-26T17:49:33,346][INFO ][o.e.t.TransportService ] [node1] publish_address {10.128.0.3:9300}, bound_addresses {10.128.0.3:9300} [2017-07-26T17:49:33,353][INFO ][o.e.b.BootstrapChecks ] [node1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks [2017-07-26T17:49:33,447][INFO ][o.e.c.g.GceInstancesServiceImpl] [node1] starting GCE discovery service [2017-07-26T17:49:37,209][INFO ][o.e.c.g.GceInstancesServiceImpl] [node1] starting GCE discovery service [2017-07-26T17:50:03,386][WARN ][o.e.n.Node ] [node1] timed out while waiting for initial discovery state - timeout: 30s [2017-07-26T17:50:03,439][INFO ][o.e.h.HttpServer ] [node1] publish_address {10.128.0.3:9200}, bound_addresses {10.128.0.3:9200} [2017-07-26T17:50:03,439][INFO ][o.e.n.Node ] [node1] started