ES version:
elasticsearch-5.6.3-1.noarch
OS version:
CentOS Linux release 7.6.1810 (Core)
Linux version 3.10.0-1160.31.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Thu Jun 10 13:32:12 UTC 2021
JDK version:
java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
elasticsearch.yml of data node
cluster.name: xxx
node.name: xxx
thread_pool.bulk.queue_size: 10000
thread_pool.search.queue_size: 10000
gateway.recover_after_nodes: 1
gateway.expected_nodes: 1
discovery.zen.ping_timeout: 30s
discovery.zen.minimum_master_nodes: 2
http.max_initial_line_length: 1mb
index.store.type: niofs
discovery.zen.ping.unicast.hosts: [ "1.1.1.1","2.2.2.2","3.3.3.3" ]
discovery.zen.fd.ping_interval: 5s
discovery.zen.fd.connect_on_network_disconnect: true
network.tcp.keep_alive: false
transport.ping_schedule: 10s
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
cluster.routing.allocation.same_shard.host: true
path.data: /data/elasticsearch,
path.logs: /data/logs/elasticsearch
network.host: 0.0.0.0
http.port: 9200
node.master: false
node.data: true
node.ingest: false
xpack.graph.enabled: true
xpack.ml.enabled: true
xpack.monitoring.enabled: true
xpack.watcher.enabled: false
xpack.security.enabled: true
the node number of each role:
master node: 3
data node: 3
coordinate node: 3
index settings:
"number_of_shards": "6",
"number_of_replicas": "1"
documents are written with self-defined routingkey, as same as the field "cityCode".
issue description
My ES has been running stably for 6 months.
In the past week, without any changes, one of the data nodes suddenly had a significantly higher cpu usage than other data nodes.
Even at the time marked by the red arrow in the figure below, the search thread pool was depleted and a large number of search rejections occurred.
Is this a bug of ES? Under what conditions will this bug be triggered? I guess this configuration may cause the problem.
index.store.type: niofs
cpu monitor of all data nodes
the qps of each data node is almost the same
thread pool monitor of all data nodes, the num of processors is 16, so 25 is the max value
hot threads info of the abnormal data node
:: {xxx}{vXoFs5peQVuVCrg0mv3Pqw}{On6lmxggQtiNpZmLhOrZqg}{x.x.x.x}{x.x.x.x:9300}{ml.max_open_jobs=10, ml.enabled=true}
Hot threads at 2023-02-25T13:20:17.991Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
12.5% (62.4ms out of 500ms) cpu usage by thread 'elasticsearch[xxx][search][T#2]'
unique snapshot
sun.nio.ch.NativeThread.current(Native Method)
sun.nio.ch.NativeThreadSet.add(NativeThreadSet.java:46)
sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:736)
sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:726)
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
org.apache.lucene.codecs.lucene50.ForUtil.readBlock(ForUtil.java:187)
org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.refillDocs(Lucene50PostingsReader.java:357)
org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockDocsEnum.advance(Lucene50PostingsReader.java:441)
org.apache.lucene.search.DisjunctionDISIApproximation.advance(DisjunctionDISIApproximation.java:66)
org.apache.lucene.search.ConjunctionDISI.doNext(ConjunctionDISI.java:213)
org.apache.lucene.search.ConjunctionDISI.nextDoc(ConjunctionDISI.java:240)
org.apache.lucene.search.ConjunctionDISI$BitSetConjunctionDISI.doNext(ConjunctionDISI.java:288)
org.apache.lucene.search.ConjunctionDISI$BitSetConjunctionDISI.nextDoc(ConjunctionDISI.java:279)
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:252)
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:197)
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:668)
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:196)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:421)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:114)
org.elasticsearch.indices.IndicesService.lambda$loadIntoContext$16(IndicesService.java:1129)
org.elasticsearch.indices.IndicesService$$Lambda$2305/398832609.accept(Unknown Source)
org.elasticsearch.indices.IndicesService.lambda$cacheShardLevelResult$18(IndicesService.java:1210)
org.elasticsearch.indices.IndicesService$$Lambda$2306/1176807687.get(Unknown Source)
org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:160)
org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:143)
org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:412)
org.elasticsearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:116)
org.elasticsearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1216)
org.elasticsearch.indices.IndicesService.loadIntoContext(IndicesService.java:1128)
org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:250)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:267)
org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:343)
org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:340)
org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:258)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:110)
org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.lambda$messageReceived$0(SecurityServerTransportInterceptor.java:307)
org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$$Lambda$1950/118852557.accept(Unknown Source)
org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:59)
org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.lambda$authorizeAsync$5(ServerTransportFilter.java:208)
org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile$$Lambda$1954/1415147666.accept(Unknown Source)
org.elasticsearch.xpack.security.authz.AuthorizationUtils$AsyncAuthorizer.maybeRun(AuthorizationUtils.java:127)
org.elasticsearch.xpack.security.authz.AuthorizationUtils$AsyncAuthorizer.setRunAsRoles(AuthorizationUtils.java:121)
org.elasticsearch.xpack.security.authz.AuthorizationUtils$AsyncAuthorizer.authorize(AuthorizationUtils.java:109)
org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.authorizeAsync(ServerTransportFilter.java:210)
org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.lambda$inbound$2(ServerTransportFilter.java:168)
org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile$$Lambda$1952/675011904.accept(Unknown Source)
org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:59)
org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$2(AuthenticationService.java:212)
org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator$$Lambda$1942/1384365306.accept(Unknown Source)
org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$lookForExistingAuthentication$4(AuthenticationService.java:246)
org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator$$Lambda$1943/1492423233.run(Unknown Source)
org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lookForExistingAuthentication(AuthenticationService.java:257)
org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.authenticateAsync(AuthenticationService.java:210)
org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.access$000(AuthenticationService.java:159)
org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:122)
org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.inbound(ServerTransportFilter.java:146)
org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:314)
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1539)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)