Migrating from Node Client -> Transport Client -> REST Client

In 2016 our cluster was setup using Node Client so that some nodes can function as pure coordinating nodes with additional custom REST API implementations. This was done to optimize on the extra network call for the custom REST API. This setup works fine but restricts us to use only the OSS distribution of elasticsearch which means that X-Pack features are not available.

Currently we are looking to finally migrate to the official REST Client. For this we have already migrated away from the not supported Node Client to the deprecated Transport Client and still using the official docker OSS distribution from elastic.

As the next steps we have tried to use the official elasticsearch docker distribution with Transport Client settings but failed to keep the cluster operational. Our attempts so far with the above mentioned setup have been:

  1. Rolling out Master nodes before Data nodes: This was unsuccessful because data nodes didn't survive because the new master cluster was not recognized or rejected. The logs only report that a master node was not found.

  2. Rolling out Data nodes first with Master nodes: This was also unsuccessful because the new master nodes were rejected. Again the underlying reason is not clear.

Does anyone have any recommendations or previous experiences?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

@girishc13 so the problem is the migration from the OSS to the default images (independent of Node / Transport / REST client)?

For starters I think we'll need:

  • Elasticsearch version — you are moving to the same version and not doing an upgrade at the same time, right?
  • Some logs of the error. Especially if you're on version 7.x already those have quite a bit of good information.
  • Some relevant configs (mostly around the master and discovery configuration)

Even if you aren't doing a version upgrade, but only an image one, I think the recommendations for a rolling upgrade should still apply, so data nodes first.

But let's figure out the answers to the questions above first and then we can dive deeper with some logs.

Hi, thanks for you questions. In an effort to reproduce the issue and get more logs, I gave the rolling upgrade procedure another try by creating a fresh new cluster manually with 3 master nodes and 3 data nodes. In short the rolling upgrade method starting with data nodes first and then rolling the master nodes worked.

I was able to trace the issue we faced before to an auto deploy CI step which updated only one master node with the default distribution. We faced the issue as reported above with the following steps:

  1. Only one master node was upgraded to default distribution while the other 2 master nodes remained on oss. The cluster remained green and stable. This step was triggered automatically on new docker image build with default distribution.
  2. Update data nodes to default distribution. Upgrade completed and the cluster remained green.
  3. Trigger rolling update of master nodes. This step causes havoc and the cluster is lost.

Some logs from each master node on step 3:
es-master-pure-2 (the very first node already on default distribution):

master node changed {previous [{es-master-pure-1}{FkKNvBXATUy7rs5wH64yGQ}{bLAEwN00R46grZzEIG3pxQ}{10.2.12.88}{10.2.12.88:9300}{m}{zone=eu-central-1b,eu-central-1b-fake, group=master}], current []}, term: 1, version: 16, reason: becoming candidate: onLeaderFailure
waiting for elected master node [null] to setup local exporter [default_local] (does it have x-pack installed?)

elected-as-master ([2] nodes joined)[{es-master-pure-0}{dSFqwog6T1SaBOacZU7e-A}{CSogFY3qQ7Wh80gcX9afzQ}{10.2.29.7}{10.2.29.7:9300}{m}{zone=eu-central-1a,eu-central-1a-fake, group=master} elect leader, {es-master-pure-2}{PPR8fyBVSGmvjy_d9h-5Ew}{deicZNgqR0GNvyBGKLUoLA}{10.2.31.16}{10.2.31.16:9300}{lm}{ml.machine_memory=2097152000, xpack.installed=true, zone=eu-central-1c,eu-central-1c-fake, ml.max_open_jobs=20, group=master} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 2, version: 17, reason: master node changed {previous [], current [{es-master-pure-2}{PPR8fyBVSGmvjy_d9h-5Ew}{deicZNgqR0GNvyBGKLUoLA}{10.2.31.16}{10.2.31.16:9300}{lm}{ml.machine_memory=2097152000, xpack.installed=true, zone=eu-central-1c,eu-central-1c-fake, ml.max_open_jobs=20, group=master}]}

master node changed {previous [], current [{es-master-pure-2}{PPR8fyBVSGmvjy_d9h-5Ew}{deicZNgqR0GNvyBGKLUoLA}{10.2.31.16}{10.2.31.16:9300}{lm}{ml.machine_memory=2097152000, xpack.installed=true, zone=eu-central-1c,eu-central-1c-fake, ml.max_open_jobs=20, group=master}]}, term: 2, version: 17, reason: Publication{term=2, version=17}

Starting template upgrade to version 7.4.1, 8 templates will be updated and 0 will be removed
	node-left[{es-master-pure-1}{FkKNvBXATUy7rs5wH64yGQ}{bLAEwN00R46grZzEIG3pxQ}{10.2.12.88}{10.2.12.88:9300}{m}{zone=eu-central-1b,eu-central-1b-fake, group=master} disconnected], term: 2, version: 18, reason: removed {{es-master-pure-1}{FkKNvBXATUy7rs5wH64yGQ}{bLAEwN00R46grZzEIG3pxQ}{10.2.12.88}{10.2.12.88:9300}{m}{zone=eu-central-1b,eu-central-1b-fake, group=master},}

Templates were upgraded successfully to version 7.4.1
failing [put-lifecycle-watch-history-ilm-policy]: failed to commit cluster state version [36],, \"cluster.uuid\": \"J5PcdLRGTI2X1DRjIsdKkA\", \"node.id\": \"PPR8fyBVSGmvjy_d9h-5Ew\"

es-master-pure-1 (node that's currently being restarted for upgrade)

failed to join {es-master-pure-2}{PPR8fyBVSGmvjy_d9h-5Ew}{deicZNgqR0GNvyBGKLUoLA}{10.2.31.16}{10.2.31.16:9300}{lm}{ml.machine_memory=2097152000, ml.max_open_jobs=20, xpack.installed=true, zone=eu-central-1c,eu-central-1c-fake, group=master} with JoinRequest{sourceNode={es-master-pure-1}{9ItooNM0T1WN0C-iDQ6wXA}{ZEGDcPHvQw-REy-uvI812g}{10.2.3.16}{10.2.3.16:9300}{lm}{ml.machine_memory=2097152000, xpack.installed=true, zone=eu-central-1b,eu-central-1b-fake, ml.max_open_jobs=20, group=master}, optionalJoin=Optional[Join{term=281, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={es-master-pure-1}{9ItooNM0T1WN0C-iDQ6wXA}{ZEGDcPHvQw-REy-uvI812g}{10.2.3.16}{10.2.3.16:9300}{lm}{ml.machine_memory=2097152000, xpack.installed=true, zone=eu-central-1b,eu-central-1b-fake, ml.max_open_jobs=20, group=master}, targetNode={es-master-pure-2}{PPR8fyBVSGmvjy_d9h-5Ew}{deicZNgqR0GNvyBGKLUoLA}{10.2.31.16}{10.2.31.16:9300}{lm}{ml.machine_memory=2097152000, ml.max_open_jobs=20, xpack.installed=true, zone=eu-central-1c,eu-central-1c-fake, group=master}}]}

es-master-pure-0 (the node that's still on oss pending upgrade)

master node changed {previous [{es-master-pure-1}{FkKNvBXATUy7rs5wH64yGQ}{bLAEwN00R46grZzEIG3pxQ}{10.2.12.88}{10.2.12.88:9300}{m}{zone=eu-central-1b,eu-central-1b-fake, group=master}], current []}, term: 1, version: 16, reason: becoming candidate: onLeaderFailure

master node changed {previous [], current [{es-master-pure-2}{PPR8fyBVSGmvjy_d9h-5Ew}{deicZNgqR0GNvyBGKLUoLA}{10.2.31.16}{10.2.31.16:9300}{lm}{ml.machine_memory=2097152000, ml.max_open_jobs=20, xpack.installed=true, zone=eu-central-1c,eu-central-1c-fake, group=master}]}, term: 2, version: 17, reason: ApplyCommitRequest{term=2, version=17, sourceNode={es-master-pure-2}{PPR8fyBVSGmvjy_d9h-5Ew}{deicZNgqR0GNvyBGKLUoLA}{10.2.31.16}{10.2.31.16:9300}{lm}{ml.machine_memory=2097152000, ml.max_open_jobs=20, xpack.installed=true, zone=eu-central-1c,eu-central-1c-fake, group=master}}

removed {{es-master-pure-1}{FkKNvBXATUy7rs5wH64yGQ}{bLAEwN00R46grZzEIG3pxQ}{10.2.12.88}{10.2.12.88:9300}{m}{zone=eu-central-1b,eu-central-1b-fake, group=master},}, term: 2, version: 18, reason: ApplyCommitRequest{term=2, version=18, sourceNode={es-master-pure-2}{PPR8fyBVSGmvjy_d9h-5Ew}{deicZNgqR0GNvyBGKLUoLA}{10.2.31.16}{10.2.31.16:9300}{lm}{ml.machine_memory=2097152000, ml.max_open_jobs=20, xpack.installed=true, zone=eu-central-1c,eu-central-1c-fake, group=master}}

unexpected error while deserializing an incoming cluster state , \"cluster.uuid\": \"J5PcdLRGTI2X1DRjIsdKkA\", \"node.id\": \"dSFqwog6T1SaBOacZU7e-A\
"log":"\"stacktrace\": [\"java.lang.IllegalArgumentException: Unknown NamedWriteable [org.elasticsearch.cluster.metadata.MetaData$Custom][index_lifecycle]\"

failed to join {es-master-pure-2}{PPR8fyBVSGmvjy_d9h-5Ew}{deicZNgqR0GNvyBGKLUoLA}{10.2.31.16}{10.2.31.16:9300}{lm}{ml.machine_memory=2097152000, ml.max_open_jobs=20, xpack.installed=true, zone=eu-central-1c,eu-central-1c-fake, group=master} with JoinRequest{sourceNode={es-master-pure-0}{dSFqwog6T1SaBOacZU7e-A}{CSogFY3qQ7Wh80gcX9afzQ}{10.2.29.7}{10.2.29.7:9300}{m}{zone=eu-central-1a,eu-central-1a-fake, group=master}, optionalJoin=Optional[Join{term=3, lastAcceptedTerm=2, lastAcceptedVersion=35, sourceNode={es-master-pure-0}{dSFqwog6T1SaBOacZU7e-A}{CSogFY3qQ7Wh80gcX9afzQ}{10.2.29.7}{10.2.29.7:9300}{m}{zone=eu-central-1a,eu-central-1a-fake, group=master}, targetNode={es-master-pure-2}{PPR8fyBVSGmvjy_d9h-5Ew}{deicZNgqR0GNvyBGKLUoLA}{10.2.31.16}{10.2.31.16:9300}{lm}{ml.machine_memory=2097152000, ml.max_open_jobs=20, xpack.installed=true, zone=eu-central-1c,eu-central-1c-fake, group=master}}]}

I excluded stack traces for brevity. Let me know if more information is needed.