Intermittent NoNodeAvailableException - 2.4.1 Docker ES

Hi,
For some weird reason I am having to make a cluster out of 2.4.1 ES
I have created a docker elasticsearch 2.4.1 cluster on AWS.
I use cloud-aws for discovery and it seems to discover fine.
A 3x node cluster.

{
    "cluster_name": "events0",
    "nodes": {
        "VXAO_DNOTu2FOJSX-lt59g": {
            "name": "Washout",
            "transport_address": "10.11.23.184:9300",
            "host": "10.11.23.184",
            "ip": "10.11.23.184",
            "version": "2.4.1",
            "build": "c67dc32",
            "http_address": "10.11.23.184:9200",
            "attributes": {
                "aws_availability_zone": "eu-west-2b"
            },
            "http": {
                "bound_address": [
                    "[::]:9200"
                ],
                "publish_address": "10.11.23.184:9200",
                "max_content_length_in_bytes": 104857600
            }
        },
        "46-eohV-QtKcQZ_ILebBNg": {
            "name": "La Nuit",
            "transport_address": "10.11.24.106:9300",
            "host": "10.11.24.106",
            "ip": "10.11.24.106",
            "version": "2.4.1",
            "build": "c67dc32",
            "http_address": "10.11.24.106:9200",
            "attributes": {
                "aws_availability_zone": "eu-west-2a"
            },
            "http": {
                "bound_address": [
                    "[::]:9200"
                ],
                "publish_address": "10.11.24.106:9200",
                "max_content_length_in_bytes": 104857600
            }
        },
        "n-aG0kOoSxmU3zKg2srX2A": {
            "name": "Shola",
            "transport_address": "10.11.24.10:9300",
            "host": "10.11.24.10",
            "ip": "10.11.24.10",
            "version": "2.4.1",
            "build": "c67dc32",
            "http_address": "10.11.24.10:9200",
            "attributes": {
                "aws_availability_zone": "eu-west-2a"
            },
            "http": {
                "bound_address": [
                    "[::]:9200"
                ],
                "publish_address": "10.11.24.10:9200",
                "max_content_length_in_bytes": 104857600
            }
        }
    }
}

Now, we are using a Springboot TransportClient that talks to ES cluster using an internal load balancer.

It works fine (40 requests per second) but under load (100 requests per second) it starts to throw intermittent NoNodeAvailableException (every 5th call)

None of the configured nodes are available: [{#transport#-1}{pgi-niaml-e0-events-es-lb.dev.digital.local}{10.11.23.55:9300}]

The error lists an ip that does not belong to any of the nodes and it seems to be a ghost ip as I am not able to find any instances with that ip.
It's not the docker ip either as that's allocated to a completely different prefix (172.0.1.0)

Can you please help.

Thanks,
Arun

May be you have some information in elasticsearch logs, such as memory pressure, gc messages..?

Anyway some recommendations:

  • upgrade
  • use the REST client

Thanks for getting back to me.
I wish upgrade/REST client was an option, at least not in the short term.

I do see these in the logs,

[2019-04-30 07:37:19,794][INFO ][monitor.jvm ] [Washout] [gc][old][253708][342] duration [5.5s], collections [1]/[6.2s], total [5.5s]/[2.7m], memory [14.9gb]->[3.9gb]/[15.8gb], all_pools {[young] [566.4mb]->[156.4mb]/[865.3mb]}{[survivor] [108.1mb]->[0b]/[108.1mb]}{[old] [14.2gb]->[3.7gb]/[14.9gb]}

Not sure if that's an indication that its struggling. Still doesn't explain why its looking for a completely wrong ip.

Oh, I also found out that the springboot client is using 2.4.5 package, perhaps that's the issue, I will test with 2.4.1 client.

Thanks,
Arun

Hi,

Just an update, the springboot client uses a Node client to create connections. Not sure if that can lead to such a behaviour.

NodeBuilder.nodeBuilder().local(true).clusterName(clusterName).node().client()

Thanks,
Arun

You have probably some memory pressure issues you need to fix.
Most likely too many shards per node (that's a common source of trouble).

Hi David,

I don't see that there are memory pressures.
In fact I was able to reproduce this in other environments that doesn't have a lot of traffic, so its definitely not load related.
It just starts happening over a period of time and a restart of the micro services gets rid of the issue temporarily.

Just so I understand how can a memory issue introduce a client being intermittently routed an ip that does not exist?

Also, there are 6 indexes with default settings, so about 3 or 4 shards per node per index. Totalling to around 20 shards per node. Is that a lot?
Around 3 million documents in total.

Thanks,
Arun

Hi @dadoonet,
Could this explain the issue,

Node Client Downsides
Embedding a node client into your application is the easiest way to connect to an Elasticsearch cluster, but it carries some downsides.
Frequently starting and stopping one or more node clients creates unnecessary noise across the cluster.
Embedded node client will respond to outside requests, just like any other client.
You almost always want to disable HTTP for an embedded node client.

Its an extract from this link,
https://www.elastic.co/guide/en/elasticsearch/client/java-api/2.0/node-client.html

Could it be that the embedded node client is responding to requests? And it's that ip that's getting listed?

But why would it respond to itself? Sorry I am might be getting confused :frowning:

Thanks,
Arun

No. Not really. At least not with the most recent versions like 6.x or even better 7.x.
But 3m documents does not look a big number. Well it depends on the size of a document (and the size per shard). May be you should decrease that number.

Best thing to do IMO is to:

  • upgrade
  • use the REST client

But you already know that. :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.