Transportclient 502 after Added nodes in cluster

there is a elasticsearch cluster which version is 2.4.6 in our company. We met a strange problem recently. After Added nodes into this cluster, the client will met a large of number requests status become 499 or 502( now the cluster has finished rebalance), the requests are timeout. but after we remove the new nodes , this client become ok in less than two minutes.

the cluster version is 2.4.6, the client is

<groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>2.4.6</version>

the old machine's os is centos 6.10 and the new machine's os is centos 7.1. there are no any other differences between old nodes and new nodes.

and there are no error logs in cluster.

there are no hot nodes while the client has timeout request. I have checked the distribution of the shards, the shard balance and index balance are both ok.

do you meet some cases like this? can somebody help me? Or give me some ideas on how to troubleshoot this problem

thank you

Welcome!

I'm afraid this 2.x version is way too old to be supported. Even 5.x is not supported.
We are now at 7.10...

Don't use the TransportClient as it is deprecated and will be removed in the next major version. Use the REST Client instead.

thanks.
but upgrade the version is not a simple thing in production environment,espechially the 2.x. and if I upgrade version of cluster, the applications need upgrade the client version (we have too many applications use the client.) so we have to still use this version for at least half of year.

to resolve my issue, do you have any good advice ? or any advice to find the reason?

Does the ES 2.4.6 allows to run different CentOs version(6.10 and 7.1) in a cluster?

I have no idea. For the server side, I'd use the same OS, same JVM, etc for all the nodes.

On the client side, make sure that you are using the exact same version. May be few things changed. I remember that in the past we were using Java to serialize some objects and having multiple versions on the Transport Layer (which your TransportClient is using) was causing issues.

Elasticsearch fixed that a long time ago. Can't really remember when. Well. It was 3.5 years ago :wink:

In short:

  • Upgrade
  • If you can't:
    • check that you have the same up to date JVM on all instances and the client side.
    • If this still not work, have a look at this non official Java Rest Client. It might help you in the short term

HTH

thank you. I'll try it.

BTW, I find that in the client side, our system create more than one java transpotclient instances.

In another discuss( TranspotClient thread sage ), you mentioned the whole jvm need only one client instance. why if there are multiple instances of java client , it will be not thread safe? and what will happen if create multiple clients? I haven't find any explanations in the official document. could you give the answer or some error cases by multiple client instances or the document url ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.