7.5.0 rolling update tried but getting java error in trace when bringing up nodes 2 and 3 in my cluster

Ed_Lopez · December 14, 2019, 5:34pm

I tried doing the rolling update from 7.3 to 7.5.0

The cluster went offline when I took one node down.

I stopped all nodes.

I upgraded node 1 and started no problem.

I try to start nodes 2 and 3 and I get a java error in the trace.

I have xpack enabled and ssl

On centos 7

[2019-12-14T17:08:10,396][TRACE][o.e.t.n.ESLoggingHandler ] [es2] [id: 0xe681cfcc, L:/172.31.43.80:9300 - R:/172.31.38.238:38628] FLUSH
[2019-12-14T17:08:10,396][TRACE][o.e.t.T.tracer ] [es2] [430][internal:tcp/handshake] sent error response
java.lang.IllegalStateException: transport not ready yet to handle incoming requests
at org.elasticsearch.transport.TransportService.onRequestReceived(TransportService.java:891) ~[elasticsearch-7.5.0.jar:7.5.0]

at end of my log I get
[2019-12-14T17:08:10,729][INFO ][o.e.x.m.p.NativeController] [es2] Native controller process has stopped - no new native processes can be started

VietCong · December 14, 2019, 7:12pm

How many nodes total do you have? And are they all 7.2 before the upgrade? Can you also share the elasticsearch.yml file?

Ed_Lopez · December 14, 2019, 7:40pm

We had 4 nodes...3 masters and 1 data

here is my elasticsearch.yml

======================== Elasticsearch Configuration =========================

NOTE: Elasticsearch comes with reasonable defaults for most settings.

Before you set out to tweak and tune the configuration, make sure you

understand what are you trying to accomplish and the consequences.

The primary way of configuring a node is via this file. This template lists

the most important settings you may want to configure for a production cluster.

Please consult the documentation for further information on configuration options:

Elasticsearch Guide | Elastic

---------------------------------- Cluster -----------------------------------

Use a descriptive name for your cluster:

cluster.name: production

------------------------------------ Node ------------------------------------

Use a descriptive name for the node:

#node.name: node-1

node.name: es2

Add custom attributes to the node:

#node.attr.rack: r1

----------------------------------- Paths ------------------------------------

Path to directory where to store the data (separate multiple locations by comma):

#path.data: /path/to/data

path.data: /data/elasticsearch/data,/data2/elasticsearch/data,/data3/elasticsearch/data,/data4/elasticsearch/data,/data5/elasticsearch/data

Path to log files:

#path.logs: /path/to/logs

path.logs: /data5/elastic-logs

----------------------------------- Memory -----------------------------------

Lock the memory on startup:

#bootstrap.memory_lock: true

Make sure that the heap size is set to about half the memory available

on the system and that the owner of the process is allowed to use this

limit.

Elasticsearch performs poorly when the system is swapping the memory.

---------------------------------- Network -----------------------------------

Set the bind address to a specific IP (IPv4 or IPv6):

#network.host: 192.168.0.1

#network.host: 0.0.0.0

network.host: 172.31.43.80

#network.host: [172.31.43.80, 172.31.128.81]

transport

#transport.host: 172.31.43.80

#transport.host: [172.31.43.80, 172.31.128.81]

Set a custom port for HTTP:

http.port: 9200

For more information, consult the network module documentation.

--------------------------------- Discovery ----------------------------------

Pass an initial list of hosts to perform discovery when new node is started:

The default list of hosts is ["127.0.0.1", "[::1]"]

#discovery.zen.ping.unicast.hosts: ["host1", "host2"]

#discovery.zen.ping.unicast.hosts: ["es1", "es2", "es3"]

Added by EdL 7-11-2019 per change in 7.2.0

#discovery.seed_hosts: ["es1", "es2", "es3"]

#discovery.seed_hosts: ["es1", "es3"]

#network.bind_host: 172.31.43.80

#network.bind_host: [172.31.43.80, 172.31.128.81]

#network.publish_host: 172.31.43.80

#network.publish_host: [172.31.43.80, 172.31.128.81]

node.master: true

#node.data: true

#node.ingest: true

Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):

discovery.zen.minimum_master_nodes: 2

discovery.zen.ping.unicast.hosts: ["es1", "es2","es3"]

#discovery.zen.minimum_master_nodes: 1

For more information, consult the zen discovery module documentation.

cluster.initial_master_nodes:

es1
es2
es3

---------------------------------- Gateway -----------------------------------

Block initial recovery after a full cluster restart until N nodes are started:

gateway.recover_after_nodes: 3

#gateway.recover_after_nodes: 1

For more information, consult the gateway module documentation.

---------------------------------- Various -----------------------------------

Require explicit names when deleting indices:

#action.destructive_requires_name: true

#xpack.security.enabled: false

#xpack.security.http.ssl.enabled: false

#logger.org.elasticsearch.transport: trace

xpack.security.enabled: true

xpack.security.transport.ssl.enabled: true

xpack.security.transport.ssl.verification_mode: certificate

xpack.security.transport.ssl.keystore.path: /etc/elasticsearch/certs/es2.xxxxxxxxx.com.p12

xpack.security.transport.ssl.truststore.path: /etc/elasticsearch/certs/es2.xxxxxxxx.com.p12

xpack.security.http.ssl.enabled: true

xpack.security.http.ssl.keystore.path: /etc/elasticsearch/certs/es2.xxxxxxxxx.com.p12

xpack.security.http.ssl.truststore.path: /etc/elasticsearch/certs/es2.xxxxxxxx.com.p12

s3.client.default.endpoint: s3.us-west-2.amazonaws.com

#xpack.notification.email.account:

gmail_account:

profile: gmail

smtp:

auth: false

starttls.enable: true

host: smtp.gmail.com

port: 587

user: noreply@xxxxxx.com

#logger.org.elasticsearch.cluster.coordination: TRACE

#logger.org.elasticsearch.discovery: TRACE

transport.connect_timeout: 120s

logger.org.elasticsearch.transport: trace

Ed_Lopez · December 14, 2019, 8:01pm

before upgrade we were 7.3.0

VietCong · December 14, 2019, 8:19pm

There should be no need for 3 master nodes when you have only one data node. You probably wont need a dedicated master node at all. One of your data node can act as a data+master node and the rest can be dedicated data node.

I also see in your elasticsearch.yml, you still have some configuration that was retired starting from 7.0 like # Of master eligible nodes and such

Since your cluster is offline, you should try to set up the proper config for 7.x on all nodes then start them all back up?

Ed_Lopez · December 14, 2019, 9:09pm

We have data on 3 masters nodes + 1 data

It was all running on 7.3.0 just fine. Then broke after upgrade.

I have been trying the various configs to see if I could get it going again.

If a master node has data to I have to explicitly call that out? Before just saying master node was enough.

If the machines have two nics is there anything special?

One of our master nodes came up fine but the others did not and have that jvm error.

Please advise, this is a production cluster.

Our dev cluster came up all fine.

If I cannot get the prod cluster back online how do I move my data with shards to my dev cluster to get things going again?

My dev cluster is all good and running 7.5.0 no problem
Ed

DavidTurner · December 15, 2019, 8:40am

This TRACE-level message is normal and (like pretty much all TRACE-level logging) can be ignored.

Your post is basically unreadable due to its lack of formatting. You will likely get more help if you fix that. Use the </> button for fixed width text like logs and config files.

VietCong · December 15, 2019, 8:26pm

Like David mentioned, your log does not tell what the problem is. I suggest looking for log like with [ERROR] and share it here. Your elasticsearch.yml file still has retired configurations from 6.x. I would suggest checking breaking changes page on 7.3 and 7.5.

To specify a node to be master or data you would need to use this lines. Having 3 master nodes and only one data node appears to be a red flag to me

node.master: true —> set master

node.data: true —> set data

system · January 12, 2020, 8:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.