Errors io.netty.handler.codec.DecoderException: java.lang.IllegalArgumentException: newPosition > limit

ES Version: 6.8.6
Lucene Version: 7.7.2

We have a 3 nodes cluster where all 3 nodes are marked as "mdi".
When we loop on /_cat/nodes, we saw one of the node randomly go out of cluster and entire cluster is in unstable stable. It will be random that any node can go out of cluster. When I say, go out of cluster, i don't see that in cat/nodes.

From logs, we see below errors continuously logging.
Please note this state exist even with 0 load where no new search/ingest requests are coming.

[2020-11-09T06:40:31,260][WARN ][o.e.t.TcpTransport       ] [psrnativecec030920-esdata0] exception caught on transport layer [Netty4TcpChannel{localAddress=/100.73.50.18:9300, remoteAddress=/100.73.50.120:20266}], closing connection
    io.netty.handler.codec.DecoderException: java.lang.IllegalArgumentException: newPosition > limit: (5898240 > 5455480)
            at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:275) ~[netty-codec-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) [netty-transport-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [netty-transport-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) [netty-transport-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) [netty-transport-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-transport-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.44.Final.jar:4.1.44.Final]
            at java.lang.Thread.run(Thread.java:834) [?:?]
    Caused by: java.lang.IllegalArgumentException: newPosition > limit: (5898240 > 5455480)
            at java.nio.Buffer.createPositionException(Buffer.java:318) ~[?:?]
            at java.nio.Buffer.position(Buffer.java:293) ~[?:?]
            at java.nio.ByteBuffer.position(ByteBuffer.java:1086) ~[?:?]
            at java.nio.MappedByteBuffer.position(MappedByteBuffer.java:226) ~[?:?]
            at java.nio.MappedByteBuffer.position(MappedByteBuffer.java:67) ~[?:?]
            at io.netty.buffer.PoolArena$DirectArena.memoryCopy(PoolArena.java:794) ~[netty-buffer-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.buffer.PoolArena$DirectArena.memoryCopy(PoolArena.java:704) ~[netty-buffer-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.buffer.PoolArena.reallocate(PoolArena.java:405) ~[netty-buffer-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:118) ~[netty-buffer-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:306) ~[netty-buffer-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:282) ~[netty-buffer-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1104) ~[netty-buffer-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1097) ~[netty-buffer-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1088) ~[netty-buffer-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:94) ~[netty-codec-4.1.44.Final.jar:4.1.44.Final]
            at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:269) ~[netty-codec-4.1.44.Final.jar:4.1.44.Final]
            ... 15 more

Have you got discovery.zen.minimum_master_nodes set to 2 according to these guidelines?

What is the full output of the cluster stats API?

Yes min master nodes is set to 2 in elasticsearch.yml file as below

discovery.zen.ping.unicast.hosts: ["Test-esdata0.svcsubnetiadad1.p2oscsnat01iad.oraclevcn.com","Test-esdata1.svcsubnetiadad2.p2oscsnat01iad.oraclevcn.com","Test-esdata2.svcsubnetiadad3.p2oscsnat01iad.oraclevcn.com"]
discovery.zen.minimum_master_nodes: 2

cluster stats api output is as below

{
  "_nodes": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "cluster_name": "Test1cluster",
  "cluster_uuid": "9xBNqdhwTbaiqG8GxRS-Ow",
  "timestamp": 1604905671273,
  "status": "yellow",
  "indices": {
    "count": 27,
    "shards": {
      "total": 72,
      "primaries": 36,
      "replication": 1,
      "index": {
        "shards": {
          "min": 2,
          "max": 6,
          "avg": 2.6666666666666665
        },
        "primaries": {
          "min": 1,
          "max": 3,
          "avg": 1.3333333333333333
        },
        "replication": {
          "min": 0.6666666666666666,
          "max": 2,
          "avg": 1.0246913580246912
        }
      }
    },
    "docs": {
      "count": 3941,
      "deleted": 3
    },
    "store": {
      "size": "5.6mb",
      "size_in_bytes": 5916050
    },
    "fielddata": {
      "memory_size": "0b",
      "memory_size_in_bytes": 0,
      "evictions": 0
    },
    "query_cache": {
      "memory_size": "0b",
      "memory_size_in_bytes": 0,
      "total_count": 0,
      "hit_count": 0,
      "miss_count": 0,
      "cache_size": 0,
      "cache_count": 0,
      "evictions": 0
    },
    "completion": {
      "size": "0b",
      "size_in_bytes": 0
    },
    "segments": {
      "count": 64,
      "memory": "822.2kb",
      "memory_in_bytes": 842031,
      "terms_memory": "610.5kb",
      "terms_memory_in_bytes": 625240,
      "stored_fields_memory": "19.9kb",
      "stored_fields_memory_in_bytes": 20448,
      "term_vectors_memory": "0b",
      "term_vectors_memory_in_bytes": 0,
      "norms_memory": "86kb",
      "norms_memory_in_bytes": 88064,
      "points_memory": "1.6kb",
      "points_memory_in_bytes": 1655,
      "doc_values_memory": "104.1kb",
      "doc_values_memory_in_bytes": 106624,
      "index_writer_memory": "0b",
      "index_writer_memory_in_bytes": 0,
      "version_map_memory": "0b",
      "version_map_memory_in_bytes": 0,
      "fixed_bit_set": "2.1kb",
      "fixed_bit_set_memory_in_bytes": 2224,
      "max_unsafe_auto_id_timestamp": 1604905456917,
      "file_sizes": {}
    }
  },
  "nodes": {
    "count": {
      "total": 3,
      "data": 3,
      "coordinating_only": 0,
      "master": 3,
      "ingest": 3
    },
    "versions": [
      "6.8.6"
    ],
    "os": {
      "available_processors": 12,
      "allocated_processors": 12,
      "names": [
        {
          "name": "Linux",
          "count": 3
        }
      ],
      "pretty_names": [
        {
          "pretty_name": "Oracle Linux Server 7.7",
          "count": 3
        }
      ],
      "mem": {
        "total": "87.5gb",
        "total_in_bytes": 93995417600,
        "free": "4.8gb",
        "free_in_bytes": 5204045824,
        "used": "82.6gb",
        "used_in_bytes": 88791371776,
        "free_percent": 6,
        "used_percent": 94
      }
    },
    "process": {
      "cpu": {
        "percent": 2
      },
      "open_file_descriptors": {
        "min": 473,
        "max": 8073,
        "avg": 3414
      }
    },
    "jvm": {
      "max_uptime": "3d",
      "max_uptime_in_millis": 262894226,
      "versions": [
        {
          "version": "11.0.4",
          "vm_name": "Java HotSpot(TM) 64-Bit Server VM",
          "vm_version": "11.0.4+10-LTS",
          "vm_vendor": "Oracle Corporation",
          "count": 3
        }
      ],
      "mem": {
        "heap_used": "17.9gb",
        "heap_used_in_bytes": 19305353184,
        "heap_max": "33gb",
        "heap_max_in_bytes": 35433480192
      },
      "threads": 275
    },
    "fs": {
      "total": "449.7gb",
      "total_in_bytes": 482935320576,
      "free": "438.8gb",
      "free_in_bytes": 471221567488,
      "available": "438.8gb",
      "available_in_bytes": 471221567488
    },
    "plugins": [
      {
        "name": "analysis-icu",
        "version": "6.8.6",
        "elasticsearch_version": "6.8.6",
        "java_version": "1.8",
        "description": "The ICU Analysis plugin integrates the Lucene ICU module into Elasticsearch, adding ICU-related analysis components.",
        "classname": "org.elasticsearch.plugin.analysis.icu.AnalysisICUPlugin",
        "extended_plugins": [],
        "has_native_controller": false
      },
      {
        "name": "opendistro_security",
        "version": "0.10.1.1",
        "elasticsearch_version": "6.8.6",
        "java_version": "1.8",
        "description": "Provide access control related features for Elasticsearch 6",
        "classname": "com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin",
        "extended_plugins": [],
        "has_native_controller": false
      },
      {
        "name": "repository-s3",
        "version": "6.8.6",
        "elasticsearch_version": "6.8.6",
        "java_version": "1.8",
        "description": "The S3 repository plugin adds S3 repositories",
        "classname": "org.elasticsearch.repositories.s3.S3RepositoryPlugin",
        "extended_plugins": [],
        "has_native_controller": false
      },
      {
        "name": "ingest-attachment",
        "version": "6.8.6",
        "elasticsearch_version": "6.8.6",
        "java_version": "1.8",
        "description": "Ingest processor that uses Apache Tika to extract contents",
        "classname": "org.elasticsearch.ingest.attachment.IngestAttachmentPlugin",
        "extended_plugins": [],
        "has_native_controller": false
      }
    ],
    "network_types": {
      "transport_types": {
        "com.amazon.opendistroforelasticsearch.security.ssl.http.netty.OpenDistroSecuritySSLNettyTransport": 3
      },
      "http_types": {
        "com.amazon.opendistroforelasticsearch.security.http.OpenDistroSecurityHttpServerTransport": 3
      }
    }
  }
}

That all looks fine and I do not see any reason for instability. It might be hardware/network related, but I also see that you are using OpenDistro security. I have no experience with this but as it is interfering with intra-node communications it might be worth checking with the OpenDistro community or even remove the plugin to see if the issues persist.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.