ES use 100% cpu resource due to same spacial query

our es just have one node, and it is dead sometimes, we can see there have a lot of tasks when check http://localhost:9200/_tasks?pretty&detailed , it will continual a few hours,in this time we can not connected the ES, the CPU is used above 100%. we think some spacial query make this happen, like : +<a+href="/taodns/domain/"> , do you have any solution for this problem?

Elasticsearch version (bin/elasticsearch --version): 2.4.0

Plugins installed: [head]

JVM version (java -version): 1.8.0

OS version (uname -a if on a Unix-like system): centos 7.2

Description of the problem including expected versus actual behavior:

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

server hardware , CPU E5-2630 v3 @ 2.40GHz, 128G memory
32 G for ES
ES settting
threadpool.bulk.type: fixed
threadpool.bulk.size: 16
threadpool.bulk.queue_size: 1000

threadpool.index.type: fixed
threadpool.index.size: 32
threadpool.index.queue_size: 1000

threadpool.search.type: fixed
threadpool.search.size: 49
threadpool.search.queue_size: 3000

cpu info
image
Provide logs (if relevant):
[2018-04-03 20:51:19,742][DEBUG][action.search ] [es-node-100] [75232] Failed to execute fetch phase
RemoteTransportException[[es-node-100][10.10.10.10:9300][indices:data/read/search[phase/fetch/id]]]; nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4@40b5cfcb on EsThreadPoolExecutor[search, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@1732bdf2[Running, pool size = 49, active threads = 49, queued tasks = 3000, completed tasks = 113941]]];
Caused by: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$4@40b5cfcb on EsThreadPoolExecutor[search, queue capacity = 3000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@1732bdf2[Running, pool size = 49, active threads = 49, queued tasks = 3000, completed tasks = 113941]]]
at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:50)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:85)
at org.elasticsearch.transport.TransportService.sendLocalRequest(TransportService.java:372)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:327)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:299)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:204)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:196)
at org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction.executeFetch(SearchQueryThenFetchAsyncAction.java:94)
at org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction.moveToSecondPhase(SearchQueryThenFetchAsyncAction.java:88)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.innerMoveToSecondPhase(AbstractSearchAsyncAction.java:374)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onFirstPhaseResult(AbstractSearchAsyncAction.java:171)
at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onResponse(AbstractSearchAsyncAction.java:147)
at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onResponse(AbstractSearchAsyncAction.java:144)
at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:41)
at org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:836)
at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:820)
at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:810)
at org.elasticsearch.transport.DelegatingTransportChannel.sendResponse(DelegatingTransportChannel.java:58)
at org.elasticsearch.transport.RequestHandlerRegistry$TransportChannelWrapper.sendResponse(RequestHandlerRegistry.java:140)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:369)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:365)
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Can you please provide the output of the cluster health API?

What is the rationale behind increasing the queue sizes so much?

sorry, what is your mean for cluster health API? we have 10T data in ES, we think some fulltext retrieval make it happen, due to the query content can not find in database, ES will find and list all records in database,.

How many indices and shards do you have in the cluster? How much data?

one index, 5 shards , about 1000 million records , there have no problem for normal query

So you have just 5 shards on the node, each about 2TB in size with roughly 200 million documents, is that correct?

Each query is executed single-threaded across each shard, although shards and separate queries can naturally be processed in parallel. A large and expensive query can however tie up processing for a long time, and since you have increased the queue sizes quite dramatically a lot of data can get queued up. As you are filling up your search queue, it seems like you either have some queries causing delays or are simply throwing more work at the node than it can handle.

I would recommend running the hot threads API while the node is busy to see what it is spending the CPU cycles on. This may give an indication of which queries are causing problems.

yes , it is correct
we will check for see, thanks for your Suggestion

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.