Rolling upgrade problem from 6.8 to 7.1.1

Hello,

I have four ES systems( Let's assume that each of the four node names is A, B, C, and D).

A, B, C, and D is master and data node.

Last week I did rolling upgrade all of ES System from 6.3 to 6.8. It was perfect!

This week I tried rolling upgrade all from 6.8 to 7.1.1.

A , B and C was good. There was no problem. It was clear.

But there's a problem when I tried to upgrade D. D was an elected master.

I set a config file like this. I guess this problem is not with this.

node.name: node-D
.
.
discovery.seed_hosts: ["172.31.xx.A", "172.31.xx.B", "172.31.xx.C", "172.31.xx.D"]
cluster.initial_master_nodes: ["node-C", "node-D"]

I stopped the elasticsearch process on D for rolling upgrade. I was watching Kibana monitoring continuously. D was still on the screen with the elected master sign. I was waiting for some minutes, D was still alive. I had no choice but restart D of 7.1.1 version.

This error was displayed on the screen then.

[WARN ][o.e.c.c.ClusterFormationFailureHelper] [node-D] master not discovered or elected yet, an election requires at least 2 nodes with ids from ...

D was still on the screen with the elected master sign yet.

You know, there is no way to change cluster master forcibly. So, I restart A, B, and C one by one. (I know THAT was too stupid.)

But all system was displayed the same error on the screen.

What's the way to get this back on track?

Please share all the logs from all 4 of your nodes from before when you started the upgrade. If you do not want to do this publicly then please send them to me at david.turner@elastic.co.

I sent email because the errors and warning messages are over the limit.

Anyway, thanks for your reply. :smiley:

I really need to see the full logs from each of these nodes starting from before the upgrade. You've only sent me a few selected messages and that's not enough to determine the problem.

Could I share the log files on my google drive by your email address "david.turner@elastic.co"? files are too big.

Yes, that's fine too.

I shared them. FYI, in my example, A is 638_logs.tar and B is 637_logs.tar.

And I tried to upgrade from 6.8 to 7.1.1 yesterday. And I failed to upgrade 639_logs.tar about 5pm(GMT +9 time zone)
Thanks :slight_smile:

Thanks. The problem seems to be here:

[2019-06-20T15:04:15,842][INFO ][o.e.c.c.JoinHelper       ] [REDACTED] failed to join REDACTED...
org.elasticsearch.transport.RemoteTransportException: [REDACTED][REDACTED:9300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.cluster.coordination.FailedToCommitClusterStateException: publication failed
        at org.elasticsearch.cluster.coordination.Coordinator$CoordinatorPublication$3.onFailure(Coordinator.java:1354) ~[elasticsearch-7.1.1.jar:7.1.1]
        at org.elasticsearch.common.util.concurrent.ListenableFuture$1.run(ListenableFuture.java:101) ~[elasticsearch-7.1.1.jar:7.1.1]
        at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:192) ~[elasticsearch-7.1.1.jar:7.1.1]
        at org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:92) ~[elasticsearch-7.1.1.jar:7.1.1]
        at org.elasticsearch.common.util.concurrent.ListenableFuture.addListener(ListenableFuture.java:54) ~[elasticsearch-7.1.1.jar:7.1.1]
        at org.elasticsearch.cluster.coordination.Coordinator$CoordinatorPublication.onCompletion(Coordinator.java:1294) ~[elasticsearch-7.1.1.jar:7.1.1]
        at org.elasticsearch.cluster.coordination.Publication.onPossibleCompletion(Publication.java:124) ~[elasticsearch-7.1.1.jar:7.1.1]
        at org.elasticsearch.cluster.coordination.Publication.cancel(Publication.java:88) ~[elasticsearch-7.1.1.jar:7.1.1]
        at org.elasticsearch.cluster.coordination.Coordinator$CoordinatorPublication$2.run(Coordinator.java:1261) ~[elasticsearch-7.1.1.jar:7.1.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) ~[elasticsearch-7.1.1.jar:7.1.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: org.elasticsearch.ElasticsearchException: publication cancelled before committing: timed out after 30s
        at org.elasticsearch.cluster.coordination.Publication.cancel(Publication.java:85) ~[elasticsearch-7.1.1.jar:7.1.1]
        at org.elasticsearch.cluster.coordination.Coordinator$CoordinatorPublication$2.run(Coordinator.java:1261) ~[elasticsearch-7.1.1.jar:7.1.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) ~[elasticsearch-7.1.1.jar:7.1.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:835) ~[?:?]

Publishing a cluster state shouldn't be taking over 30 seconds, but it looks like it is here, and this is hitting a timeout. Could you add the following setting to one of your nodes' elasticsearch.yml config files and then restart it:

logger.org.elasticsearch.cluster.service:TRACE
logger.org.elasticsearch.gateway.MetaStateService:TRACE

This will log more information about what's taking so long. It'd be good if you could let it run for a few minutes and then share the logs it's produced.

Also are you using SSDs or spinning disks in this cluster?

2 Likes

638,639 has SSD and 636, 637 has SCSI HDD. I have a plan to switch from HDD to SSD.

And I shared the new log files on google drive.

637_newLogs.tar and 638_newLogs.tar. Thanks :slight_smile:

Thanks. You have just over 1000 indices in this cluster, and the master must write a small metadata file for each index during an election. Looking at the log file, this is taking an average of around 60ms per file:

     4  [2019-06-20T17:20:50,030][TRACE][o.e.g.MetaStateService   ] [REDACTED-NODE-NAME] [[REDACTED-INDEX-NAME/SHRkSR17SkKb9YAd6q-fEA]] state written
...
  1008  [2019-06-20T17:21:51,765][TRACE][o.e.g.MetaStateService   ] [REDACTED-NODE-NAME] [[REDACTED-INDEX-NAME/jkTAbDtIShmorVWmk-nxUQ]] state written

That's slower than we expect, plus 1000 indices is quite a lot in a cluster of this size, so it adds up to just over 60 seconds and this is causing the master election to time out before it succeeds.

You should work on reducing the number of indices in the cluster, but in the meantime I think you can try temporarily setting cluster.publish.timeout: 90s on each node which will allow enough time for this data to be written.

The logs above were from one of the spinning-disk nodes. I looked at one of the SSD-backed ones too:

     4  [2019-06-20T17:19:55,676][TRACE][o.e.g.MetaStateService   ] [REDACTED-NODE-NAME] [[REDACTED-INDEX-NAME/SHRkSR17SkKb9YAd6q-fEA]] state written
...
  1008  [2019-06-20T17:19:59,504][TRACE][o.e.g.MetaStateService   ] [REDACTED-NODE-NAME] [[REDACTED-INDEX-NAME/jkTAbDtIShmorVWmk-nxUQ]] state written

This is much better, only 4 seconds to write all the metadata. However, this doesn't help, because when you shut down node D there were only 3 nodes remaining, and 2 of them had spinning disks, but the first election needed 2 of these 3 nodes to be fast enough to succeed. Bringing node D back online isn't enough, because even though it's fast enough it wasn't one of the 3 nodes you needed to be fast.

Thanks.

The situation is changed a lot. But I still got errors. I shared it on Google Drive.

Its name is 638_3rd.tar.

And I added the option "cluster.publish.timeout: 90s" in the file "elasticsearch.yml".

The errors I'm seeing now are to do with monitoring not being set up or upgraded correctly.

Is your cluster now reporting that it's formed correctly? E.g. does GET _cat/nodes show all 4 nodes, with one of them being the elected master? If so, I suggest opening another thread about the monitoring issue because I'm not the best person to ask about this and the monitoring experts here might not be reading this thread.

I think my cluster didn't work yet. It didn't respond for REST API such as GET _cat/nodes.

The other ES can't connect to the elected master. They just printed like this message.

[2019-06-20T18:18:34,434][INFO ][o.e.x.m.e.l.LocalExporter] [node-mdata-637] waiting for elected master node [{node-mdata-638}{UtHL00IMRzihkL_erqCHPA}{wUWHEoNeRkK66PRZNoaGNg}{172.31.6.38}{172.31.6.38:9300}{ml.machine_memory=33449705472, ml.max_open_jobs=20, xpack.installed=true}] to setup local exporter [default_local] (does it have x-pack installed?)

And I found a error message

[2019-06-20T18:39:05,786][WARN ][o.e.n.Node ] [node-mdata-637] timed out while waiting for initial discovery state - timeout: 30s

Why was this message printed even I set "cluster.publish.timeout: 90s" in elasticsearch.yml?

[2019-06-20T18:18:34,434][INFO ][o.e.x.m.e.l.LocalExporter] [node-mdata-637] waiting for elected master node ... to setup local exporter [default_local] (does it have x-pack installed?)

This message is to do with monitoring, and it's a bug that it's emitted at all. It does, however, indicate that the node is connected to, and communicating with, the elected master.

Why was this message printed even I set "cluster.publish.timeout: 90s" in elasticsearch.yml?

That message is due to a different timeout that isn't particularly important here.

I suggest disabling monitoring for now since it's making so much noise I can't see any other problems. Could you shut everything down and then set this on all your nodes:

xpack.monitoring.enabled: false

Then restart all the nodes and then, finally, could you share the logs from all of them again?

No problem.

I'm going to set the option.

Should I delete these options or stay?

logger.org.elasticsearch.cluster.service: TRACE
logger.org.elasticsearch.gateway.MetaStateService: TRACE

Yes, remove the logging options, I don't think they'll tell us anything new and they also make a lot of noise.

Strange things to happen. All ES works well. It's kind of humor on facebook like "https://images.app.goo.gl/k9FchCufRGo5Q6Cz6". :face_with_raised_eyebrow:

Anyway, I shared them on my google drive. Maybe there's a scary message hiding.

Thank you so much, David.

I should go to KIBANA forum. That shows me a scary error message you already pointed me about maximum shards count.

Status changed from yellow to red - Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [9816]/[4000] maximum shards open;: [remote_transport_exception] [node-mdata-638][172.31.6.38:9300][indices:admin/create]

Anyway, you save my data. Thanks a lot.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.