Intermittent NoNodeAvailableException - 2.4.1 Docker ES

ArunANayagam · April 30, 2019, 11:58am

Hi,
For some weird reason I am having to make a cluster out of 2.4.1 ES
I have created a docker elasticsearch 2.4.1 cluster on AWS.
I use cloud-aws for discovery and it seems to discover fine.
A 3x node cluster.

{
    "cluster_name": "events0",
    "nodes": {
        "VXAO_DNOTu2FOJSX-lt59g": {
            "name": "Washout",
            "transport_address": "10.11.23.184:9300",
            "host": "10.11.23.184",
            "ip": "10.11.23.184",
            "version": "2.4.1",
            "build": "c67dc32",
            "http_address": "10.11.23.184:9200",
            "attributes": {
                "aws_availability_zone": "eu-west-2b"
            },
            "http": {
                "bound_address": [
                    "[::]:9200"
                ],
                "publish_address": "10.11.23.184:9200",
                "max_content_length_in_bytes": 104857600
            }
        },
        "46-eohV-QtKcQZ_ILebBNg": {
            "name": "La Nuit",
            "transport_address": "10.11.24.106:9300",
            "host": "10.11.24.106",
            "ip": "10.11.24.106",
            "version": "2.4.1",
            "build": "c67dc32",
            "http_address": "10.11.24.106:9200",
            "attributes": {
                "aws_availability_zone": "eu-west-2a"
            },
            "http": {
                "bound_address": [
                    "[::]:9200"
                ],
                "publish_address": "10.11.24.106:9200",
                "max_content_length_in_bytes": 104857600
            }
        },
        "n-aG0kOoSxmU3zKg2srX2A": {
            "name": "Shola",
            "transport_address": "10.11.24.10:9300",
            "host": "10.11.24.10",
            "ip": "10.11.24.10",
            "version": "2.4.1",
            "build": "c67dc32",
            "http_address": "10.11.24.10:9200",
            "attributes": {
                "aws_availability_zone": "eu-west-2a"
            },
            "http": {
                "bound_address": [
                    "[::]:9200"
                ],
                "publish_address": "10.11.24.10:9200",
                "max_content_length_in_bytes": 104857600
            }
        }
    }
}

Now, we are using a Springboot TransportClient that talks to ES cluster using an internal load balancer.

It works fine (40 requests per second) but under load (100 requests per second) it starts to throw intermittent NoNodeAvailableException (every 5th call)

None of the configured nodes are available: [{#transport#-1}{pgi-niaml-e0-events-es-lb.dev.digital.local}{10.11.23.55:9300}]

The error lists an ip that does not belong to any of the nodes and it seems to be a ghost ip as I am not able to find any instances with that ip.
It's not the docker ip either as that's allocated to a completely different prefix (172.0.1.0)

Can you please help.

Thanks,
Arun

dadoonet · April 30, 2019, 1:08pm

May be you have some information in elasticsearch logs, such as memory pressure, gc messages..?

Anyway some recommendations:

upgrade
use the REST client

ArunANayagam · April 30, 2019, 1:13pm

Thanks for getting back to me.
I wish upgrade/REST client was an option, at least not in the short term.

I do see these in the logs,

[2019-04-30 07:37:19,794][INFO ][monitor.jvm ] [Washout] [gc][old][253708][342] duration [5.5s], collections [1]/[6.2s], total [5.5s]/[2.7m], memory [14.9gb]->[3.9gb]/[15.8gb], all_pools {[young] [566.4mb]->[156.4mb]/[865.3mb]}{[survivor] [108.1mb]->[0b]/[108.1mb]}{[old] [14.2gb]->[3.7gb]/[14.9gb]}

Not sure if that's an indication that its struggling. Still doesn't explain why its looking for a completely wrong ip.

Oh, I also found out that the springboot client is using 2.4.5 package, perhaps that's the issue, I will test with 2.4.1 client.

Thanks,
Arun

ArunANayagam · April 30, 2019, 4:01pm

Hi,

Just an update, the springboot client uses a Node client to create connections. Not sure if that can lead to such a behaviour.

NodeBuilder.nodeBuilder().local(true).clusterName(clusterName).node().client()

Thanks,
Arun

dadoonet · April 30, 2019, 4:32pm

You have probably some memory pressure issues you need to fix.
Most likely too many shards per node (that's a common source of trouble).

ArunANayagam · May 1, 2019, 10:05am

Hi David,

I don't see that there are memory pressures.
In fact I was able to reproduce this in other environments that doesn't have a lot of traffic, so its definitely not load related.
It just starts happening over a period of time and a restart of the micro services gets rid of the issue temporarily.

Just so I understand how can a memory issue introduce a client being intermittently routed an ip that does not exist?

Also, there are 6 indexes with default settings, so about 3 or 4 shards per node per index. Totalling to around 20 shards per node. Is that a lot?
Around 3 million documents in total.

Thanks,
Arun

ArunANayagam · May 1, 2019, 2:35pm

Hi @dadoonet,
Could this explain the issue,

Node Client Downsides
Embedding a node client into your application is the easiest way to connect to an Elasticsearch cluster, but it carries some downsides.
Frequently starting and stopping one or more node clients creates unnecessary noise across the cluster.
Embedded node client will respond to outside requests, just like any other client.
You almost always want to disable HTTP for an embedded node client.

Its an extract from this link,

Could it be that the embedded node client is responding to requests? And it's that ip that's getting listed?

But why would it respond to itself? Sorry I am might be getting confused

Thanks,
Arun

dadoonet · May 3, 2019, 1:09pm

No. Not really. At least not with the most recent versions like 6.x or even better 7.x.
But 3m documents does not look a big number. Well it depends on the size of a document (and the size per shard). May be you should decrease that number.

Best thing to do IMO is to:

upgrade
use the REST client

But you already know that.

system · May 31, 2019, 1:09pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Intermittent No node available exception Elasticsearch	2	582	October 22, 2018
Recovering from NoNodeAvailableException Elasticsearch	5	3364	July 6, 2017
No node available Exception Elasticsearch	10	7491	July 6, 2017
Node not available exception Elasticsearch	1	1156	July 5, 2017
TransportClient and NoNodeAvailableException Elasticsearch	6	644	July 6, 2017

Intermittent NoNodeAvailableException - 2.4.1 Docker ES

Related topics