Memory consumption in io.netty.buffer.PoolThreadCache

Hi guys,
I'm facing a large memory consumption in io.netty.buffer.PoolThreadCache. This portion of memory can reach up to 5GB. I think this is abnormal

/usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -XX:+ShowCodeDetailsInExceptionMessages -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.locale.providers=SPI,COMPAT -Xms31g -Xmx31g -XX:+UseG1GC -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30 -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j2.formatMsgNoLookups=true -Djava.io.tmpdir=/tmp/elasticsearch-12411683855568685293 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m -Djava.locale.providers=COMPAT -XX:UseAVX=2 -XX:MaxDirectMemorySize=16642998272 -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/etc/elasticsearch -Des.distribution.flavor=default -Des.distribution.type=rpm -Des.bundled_jdk=true -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid --quiet

UPDATE:
ES version: 7.8.0

Thanks in advance!

Welcome!

Why do you think it's abnormal?

This is not good at all. Please upgrade at least to 7.17.12. Many bugs, enhancements and security patches have been fixed since then.

This memory usage is too large. I don't think it should use so much memory, and I hope it can be reduced

What is the full output of the cluster stats API?

What is the use case?

What kind of load is the cluster under?

What type of hardware is powering the cluster?

cluster stats

{
  "_nodes" : {
    "total" : 27,
    "successful" : 27,
    "failed" : 0
  },
  "cluster_name" : "picapica_es",
  "cluster_uuid" : "p60dCsvQRSmfyfC_n_kjGQ",
  "timestamp" : 1690460363351,
  "status" : "green",
  "indices" : {
    "count" : 1577,
    "shards" : {
      "total" : 21591,
      "primaries" : 10798,
      "replication" : 0.9995369512872754,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 116,
          "avg" : 13.691185795814839
        },
        "primaries" : {
          "min" : 1,
          "max" : 58,
          "avg" : 6.847178186429931
        },
        "replication" : {
          "min" : 0.0,
          "max" : 1.0,
          "avg" : 0.9993658845909955
        }
      }
    },
    "docs" : {
      "count" : 423209498690,
      "deleted" : 68259178
    },
    "store" : {
      "size_in_bytes" : 351661022794170
    },
    "fielddata" : {
      "memory_size_in_bytes" : 89659375352,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size_in_bytes" : 59041270725,
      "total_count" : 4903505833,
      "hit_count" : 1294945834,
      "miss_count" : 3608559999,
      "cache_size" : 1251722,
      "cache_count" : 5688570,
      "evictions" : 4436848
    },
    "completion" : {
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 179880,
      "memory_in_bytes" : 2638702566,
      "terms_memory_in_bytes" : 1296360432,
      "stored_fields_memory_in_bytes" : 998467056,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 6517440,
      "points_memory_in_bytes" : 0,
      "doc_values_memory_in_bytes" : 337357638,
      "index_writer_memory_in_bytes" : 6922539500,
      "version_map_memory_in_bytes" : 161991433,
      "fixed_bit_set_memory_in_bytes" : 6618439504,
      "max_unsafe_auto_id_timestamp" : 1690446549560,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "alias",
          "count" : 93,
          "index_count" : 24
        },
        {
          "name" : "binary",
          "count" : 13,
          "index_count" : 4
        },
        {
          "name" : "boolean",
          "count" : 1736,
          "index_count" : 505
        },
        {
          "name" : "date",
          "count" : 4511,
          "index_count" : 1568
        },
        {
          "name" : "double",
          "count" : 63,
          "index_count" : 42
        },
        {
          "name" : "flattened",
          "count" : 3,
          "index_count" : 3
        },
        {
          "name" : "float",
          "count" : 3473,
          "index_count" : 937
        },
        {
          "name" : "geo_point",
          "count" : 519,
          "index_count" : 443
        },
        {
          "name" : "geo_shape",
          "count" : 3,
          "index_count" : 3
        },
        {
          "name" : "half_float",
          "count" : 68,
          "index_count" : 15
        },
        {
          "name" : "integer",
          "count" : 4917,
          "index_count" : 1019
        },
        {
          "name" : "keyword",
          "count" : 31887,
          "index_count" : 1575
        },
        {
          "name" : "long",
          "count" : 6469,
          "index_count" : 871
        },
        {
          "name" : "nested",
          "count" : 368,
          "index_count" : 336
        },
        {
          "name" : "object",
          "count" : 1717,
          "index_count" : 151
        },
        {
          "name" : "short",
          "count" : 1603,
          "index_count" : 503
        },
        {
          "name" : "text",
          "count" : 3498,
          "index_count" : 695
        }
      ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [ ],
      "filter_types" : [
        {
          "name" : "pattern_capture",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "analyzer_types" : [
        {
          "name" : "custom",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [
        {
          "name" : "uax_url_email",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_filters" : [
        {
          "name" : "lowercase",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "unique",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_analyzers" : [
        {
          "name" : "english",
          "count" : 1,
          "index_count" : 1
        }
      ]
    }
  },
  "nodes" : {
    "count" : {
      "total" : 27,
      "coordinating_only" : 0,
      "data" : 24,
      "ingest" : 27,
      "master" : 3,
      "ml" : 27,
      "remote_cluster_client" : 27,
      "transform" : 24,
      "voting_only" : 0
    },
    "versions" : [
      "7.8.0"
    ],
    "os" : {
      "available_processors" : 1728,
      "allocated_processors" : 1728,
      "names" : [
        {
          "name" : "Linux",
          "count" : 27
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "CentOS Linux 7 (Core)",
          "count" : 27
        }
      ],
      "mem" : {
        "total_in_bytes" : 7301586313216,
        "free_in_bytes" : 117647179776,
        "used_in_bytes" : 7183939133440,
        "free_percent" : 2,
        "used_percent" : 98
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 151
      },
      "open_file_descriptors" : {
        "min" : 1680,
        "max" : 6309,
        "avg" : 5435
      }
    },
    "jvm" : {
      "max_uptime_in_millis" : 29823254267,
      "versions" : [
        {
          "version" : "14.0.1",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "14.0.1+7",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 27
        }
      ],
      "mem" : {
        "heap_used_in_bytes" : 626585311520,
        "heap_max_in_bytes" : 898721906688
      },
      "threads" : 14424
    },
    "fs" : {
      "total_in_bytes" : 863817462448128,
      "free_in_bytes" : 687632075218944,
      "available_in_bytes" : 687632075218944
    },
    "plugins" : [
      {
        "name" : "analysis-smartcn",
        "version" : "7.8.0",
        "elasticsearch_version" : "7.8.0",
        "java_version" : "1.8",
        "description" : "Smart Chinese Analysis plugin integrates Lucene Smart Chinese analysis module into elasticsearch.",
        "classname" : "org.elasticsearch.plugin.analysis.smartcn.AnalysisSmartChinesePlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false
      },
      {
        "name" : "repository-s3",
        "version" : "7.8.0",
        "elasticsearch_version" : "7.8.0",
        "java_version" : "1.8",
        "description" : "The S3 repository plugin adds S3 repositories",
        "classname" : "org.elasticsearch.repositories.s3.S3RepositoryPlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false
      }
    ],
    "network_types" : {
      "transport_types" : {
        "security4" : 27
      },
      "http_types" : {
        "security4" : 27
      }
    },
    "discovery_types" : {
      "zen" : 27
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "rpm",
        "count" : 27
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 8,
      "processor_stats" : {
        "date" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "date_index_name" : {
          "count" : 7985,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 61
        },
        "geoip" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "grok" : {
          "count" : 7985,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 840
        },
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "json" : {
          "count" : 7985,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 98
        },
        "remove" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time_in_millis" : 0
        },
        "script" : {
          "count" : 31940,
          "failed" : 17,
          "current" : 0,
          "time_in_millis" : 986
        }
      }
    }
  }
}

Without any case, the memory usage has always been this high.

Write QPS as 12wļ¼Œnode configured heap memory 32Gļ¼Œ27 nodes in total

"Too large" for what exactly? This isn't at all unusual in a large and busy cluster.

1 Like

Io.netty.buffer.PoolThreadCache is the function that consumes the most memory in the entire cluster, and it requires 5GB of memory for what it needs to do

The master node has high network overhead and seems to be constantly sending requests to the data node

Are you sending indexing and query requests directly to the data nodes? Dedicated master nodes should ideally not be involved in request processing.

The master did not participate in the data request, and the backend kept reporting 'Authentication of [elastic] was terminated by real [reserved], failed to authenticate user'. Will the master continue to send permission related requests

Yes, it's how Elasticsearch stores a lot of the data it needs to do the work you send it. But that is expected, it doesn't mean it's "too large".

That is a different issue, not (directly) related to the Netty memory usage.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.