ES use 100% cpu resource due to same spacial query

showmeall · April 3, 2018, 1:31pm

our es just have one node, and it is dead sometimes, we can see there have a lot of tasks when check http://localhost:9200/_tasks?pretty&detailed , it will continual a few hours，in this time we can not connected the ES, the CPU is used above 100%. we think some spacial query make this happen, like : +<a+href="/taodns/domain/"> , do you have any solution for this problem?

Elasticsearch version (bin/elasticsearch --version): 2.4.0

Plugins installed: [head]

JVM version (java -version): 1.8.0

OS version (uname -a if on a Unix-like system): centos 7.2

Description of the problem including expected versus actual behavior:

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

server hardware , CPU E5-2630 v3 @ 2.40GHz, 128G memory
32 G for ES
ES settting
threadpool.bulk.type: fixed
threadpool.bulk.size: 16
threadpool.bulk.queue_size: 1000

threadpool.index.type: fixed
threadpool.index.size: 32
threadpool.index.queue_size: 1000

threadpool.search.type: fixed
threadpool.search.size: 49
threadpool.search.queue_size: 3000

cpu info
image
Provide logs (if relevant):
[2018-04-03 20:51:19,742][DEBUG][action.search ] [es-node-100] [75232] Failed to execute fetch phase
RemoteTransportException[[es-node-100][10.10.10.10:9300][indices:data/read/search[phase/fetch/id]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4@40b5cfcb on EsThreadPoolExecutor[search, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@1732bdf2[Running, pool size = 49, active threads = 49, queued tasks = 3000, completed tasks = 113941]]];
Caused by: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4@40b5cfcb on EsThreadPoolExecutor[search, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@1732bdf2[Running, pool size = 49, active threads = 49, queued tasks = 3000, completed tasks = 113941]]]
at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:50)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:85)
at org.elasticsearch.transport.TransportService.sendLocalRequest(TransportService.java:372)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:327)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:299)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:204)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:196)
at org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction.executeFetch(SearchQueryThenFetchAsyncAction.java:94)
at org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction.moveToSecondPhase(SearchQueryThenFetchAsyncAction.java:88)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.innerMoveToSecondPhase(AbstractSearchAsyncAction.java:374)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onFirstPhaseResult(AbstractSearchAsyncAction.java:171)
at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onResponse(AbstractSearchAsyncAction.java:147)
at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onResponse(AbstractSearchAsyncAction.java:144)
at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:41)
at org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:836)
at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:820)
at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:810)
at org.elasticsearch.transport.DelegatingTransportChannel.sendResponse(DelegatingTransportChannel.java:58)
at org.elasticsearch.transport.RequestHandlerRegistry$TransportChannelWrapper.sendResponse(RequestHandlerRegistry.java:140)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:369)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:365)
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Christian_Dahlqvist · April 3, 2018, 1:45pm

Can you please provide the output of the cluster health API?

What is the rationale behind increasing the queue sizes so much?

showmeall · April 4, 2018, 2:05am

sorry, what is your mean for cluster health API? we have 10T data in ES, we think some fulltext retrieval make it happen, due to the query content can not find in database, ES will find and list all records in database,.

Christian_Dahlqvist · April 4, 2018, 7:15am

How many indices and shards do you have in the cluster? How much data?

showmeall · April 5, 2018, 3:40am

one index, 5 shards , about 1000 million records , there have no problem for normal query

Christian_Dahlqvist · April 5, 2018, 7:54am

So you have just 5 shards on the node, each about 2TB in size with roughly 200 million documents, is that correct?

Each query is executed single-threaded across each shard, although shards and separate queries can naturally be processed in parallel. A large and expensive query can however tie up processing for a long time, and since you have increased the queue sizes quite dramatically a lot of data can get queued up. As you are filling up your search queue, it seems like you either have some queries causing delays or are simply throwing more work at the node than it can handle.

I would recommend running the hot threads API while the node is busy to see what it is spending the CPU cycles on. This may give an indication of which queries are causing problems.

showmeall · April 6, 2018, 5:54am

yes , it is correct
we will check for see, thanks for your Suggestion

system · May 4, 2018, 5:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch is spanning multiple processes and threads resulting in 100% CPU utilization Elasticsearch	16	2255	August 15, 2018
Elasticsearch full CPU utillization Elasticsearch	2	840	July 6, 2017
Getting 429 ES Exception while performing search queries Elasticsearch	10	513	March 7, 2019
Abnormally high CPU usage for specific queries/dashboards Elasticsearch	2	117	May 15, 2024
Concurrent queries (EsRejectedExecutionException and low performance) Elasticsearch	3	427	July 6, 2017

ES use 100% cpu resource due to same spacial query

Related topics