Only one of the data nodes has a significantly higher cpu usage than other data nodes

ES version:

elasticsearch-5.6.3-1.noarch

OS version:

CentOS Linux release 7.6.1810 (Core)
Linux version 3.10.0-1160.31.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Thu Jun 10 13:32:12 UTC 2021

JDK version:

java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

elasticsearch.yml of data node

cluster.name: xxx
node.name: xxx

thread_pool.bulk.queue_size: 10000
thread_pool.search.queue_size: 10000
gateway.recover_after_nodes: 1
gateway.expected_nodes: 1
discovery.zen.ping_timeout: 30s
discovery.zen.minimum_master_nodes: 2
http.max_initial_line_length: 1mb
index.store.type: niofs

discovery.zen.ping.unicast.hosts: [ "1.1.1.1","2.2.2.2","3.3.3.3"  ]
discovery.zen.fd.ping_interval: 5s
discovery.zen.fd.connect_on_network_disconnect: true
network.tcp.keep_alive: false
transport.ping_schedule: 10s
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
cluster.routing.allocation.same_shard.host: true

path.data: /data/elasticsearch,
path.logs: /data/logs/elasticsearch

network.host: 0.0.0.0
http.port: 9200

node.master: false
node.data: true
node.ingest: false

xpack.graph.enabled: true
xpack.ml.enabled: true
xpack.monitoring.enabled: true
xpack.watcher.enabled: false
xpack.security.enabled: true

the node number of each role:

master node: 3
data node: 3
coordinate node: 3

index settings:

"number_of_shards": "6",
"number_of_replicas": "1"

documents are written with self-defined routingkey, as same as the field "cityCode".

issue description
My ES has been running stably for 6 months.
In the past week, without any changes, one of the data nodes suddenly had a significantly higher cpu usage than other data nodes.
Even at the time marked by the red arrow in the figure below, the search thread pool was depleted and a large number of search rejections occurred.
Is this a bug of ES? Under what conditions will this bug be triggered? I guess this configuration may cause the problem.
index.store.type: niofs

cpu monitor of all data nodes

the qps of each data node is almost the same

thread pool monitor of all data nodes, the num of processors is 16, so 25 is the max value


hot threads info of the abnormal data node

:: {xxx}{vXoFs5peQVuVCrg0mv3Pqw}{On6lmxggQtiNpZmLhOrZqg}{x.x.x.x}{x.x.x.x:9300}{ml.max_open_jobs=10, ml.enabled=true}
   Hot threads at 2023-02-25T13:20:17.991Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

   12.5% (62.4ms out of 500ms) cpu usage by thread 'elasticsearch[xxx][search][T#2]'
     unique snapshot
       sun.nio.ch.NativeThread.current(Native Method)
       sun.nio.ch.NativeThreadSet.add(NativeThreadSet.java:46)
       sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:736)
       sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:726)
       org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
       org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
       org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
       org.apache.lucene.codecs.lucene50.ForUtil.readBlock(ForUtil.java:187)
       org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.refillDocs(Lucene50PostingsReader.java:357)
       org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.advance(Lucene50PostingsReader.java:441)
       org.apache.lucene.search.DisjunctionDISIApproximation.advance(DisjunctionDISIApproximation.java:66)
       org.apache.lucene.search.ConjunctionDISI.doNext(ConjunctionDISI.java:213)
       org.apache.lucene.search.ConjunctionDISI.nextDoc(ConjunctionDISI.java:240)
       org.apache.lucene.search.ConjunctionDISI$BitSetConjunctionDISI.doNext(ConjunctionDISI.java:288)
       org.apache.lucene.search.ConjunctionDISI$BitSetConjunctionDISI.nextDoc(ConjunctionDISI.java:279)
       org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:252)
       org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:197)
       org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:668)
       org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:196)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
       org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:421)
       org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:114)
       org.elasticsearch.indices.IndicesService.lambda$loadIntoContext$16(IndicesService.java:1129)
       org.elasticsearch.indices.IndicesService$$Lambda$2305/398832609.accept(Unknown Source)
       org.elasticsearch.indices.IndicesService.lambda$cacheShardLevelResult$18(IndicesService.java:1210)
       org.elasticsearch.indices.IndicesService$$Lambda$2306/1176807687.get(Unknown Source)
       org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:160)
       org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:143)
       org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:412)
       org.elasticsearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:116)
       org.elasticsearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1216)
       org.elasticsearch.indices.IndicesService.loadIntoContext(IndicesService.java:1128)
       org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:250)
       org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:267)
       org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:343)
       org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:340)
       org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:258)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:110)
       org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.lambda$messageReceived$0(SecurityServerTransportInterceptor.java:307)
       org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$$Lambda$1950/118852557.accept(Unknown Source)
       org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:59)
       org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.lambda$authorizeAsync$5(ServerTransportFilter.java:208)
       org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile$$Lambda$1954/1415147666.accept(Unknown Source)
       org.elasticsearch.xpack.security.authz.AuthorizationUtils$AsyncAuthorizer.maybeRun(AuthorizationUtils.java:127)
       org.elasticsearch.xpack.security.authz.AuthorizationUtils$AsyncAuthorizer.setRunAsRoles(AuthorizationUtils.java:121)
       org.elasticsearch.xpack.security.authz.AuthorizationUtils$AsyncAuthorizer.authorize(AuthorizationUtils.java:109)
       org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.authorizeAsync(ServerTransportFilter.java:210)
       org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.lambda$inbound$2(ServerTransportFilter.java:168)
       org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile$$Lambda$1952/675011904.accept(Unknown Source)
       org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:59)
       org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$2(AuthenticationService.java:212)
       org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator$$Lambda$1942/1384365306.accept(Unknown Source)
       org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$lookForExistingAuthentication$4(AuthenticationService.java:246)
       org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator$$Lambda$1943/1492423233.run(Unknown Source)
       org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lookForExistingAuthentication(AuthenticationService.java:257)
       org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.authenticateAsync(AuthenticationService.java:210)
       org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.access$000(AuthenticationService.java:159)
       org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:122)
       org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.inbound(ServerTransportFilter.java:146)
       org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:314)
       org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
       org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1539)
       org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
       org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       java.lang.Thread.run(Thread.java:745)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.