Concurrent queries (EsRejectedExecutionException and low performance)

Hi,

I've been testing concurrent queries, I have just one node in a server (2 * 4 core CPU, 16G memory) and create a index (3 shards, 1 replica). I use 1000 concurrent threads to query(use TransportClient, search condition contains a termFilter and sort in a field). I've found sometimes the testing could be finished, sometimes it cound't, because there are many EsRejectedExecutionException exceptions in ES log file.

Another problem is the average response time is over 2 seconds for 1000 threads (only about 80 millisencond for 10 threads), I don't know why.

my thread pool setting:

threadpool:
search:
type: blocking
min: 1
size: 300
wait_time: 30s
#type: fixed
#size: 80
#queue: 1000
#reject_policy: abort
index:
type: blocking
min: 1
size: 150
wait_time: 30s

[2012-07-20 00:00:21,753][DEBUG][action.search.type ] [Cowgirl] [mail][2]: Failed to execute [org.elasticsearch.action.search.SearchRequest@5af2ee9c] while moving to second phase
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException
at org.elasticsearch.common.util.concurrent.EsExecutors$TimedBlockingPolicy.rejectedExecution(EsExecutors.java:171)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.moveToSecondPhase(TransportSearchDfsQueryThenFetchAction.java:132)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:228)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onResult(TransportSearchTypeAction.java:207)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onResult(TransportSearchTypeAction.java:204)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteDfs(SearchServiceTransportAction.java:107)
at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchDfsQueryThenFetchAction.java:86)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:204)

Hi,

1000 threads is quite a number of threads for just 2 4-core servers. I'm
guessing the load is very high on those machines because threads are
fighting for CPU cycles and there is a lot of waiting. Yo could get SPM
for ES and get better insight into what's going on with performance as you
increase the number of concurrent threads

Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Scalable Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Thursday, July 19, 2012 4:56:55 AM UTC-4, daniel88 wrote:

Hi,

I've been testing concurrent queries, I have just one node in a server (2
*
4 core CPU, 16G memory) and create a index (3 shards, 1 replica). I use
1000 concurrent threads to query(use TransportClient, search condition
contains a termFilter and sort in a field). I've found sometimes the
testing
could be finished, sometimes it cound't, because there are many
EsRejectedExecutionException exceptions in ES log file.

Another problem is the average response time is over 2 seconds for 1000
threads (only about 80 millisencond for 10 threads), I don't know why.

my thread pool setting:

threadpool:
search:
type: blocking
min: 1
size: 300
wait_time: 30s
#type: fixed
#size: 80
#queue: 1000
#reject_policy: abort
index:
type: blocking
min: 1
size: 150
wait_time: 30s

[2012-07-20 00:00:21,753][DEBUG][action.search.type ] [Cowgirl]
[mail][2]: Failed to execute
[org.elasticsearch.action.search.SearchRequest@5af2ee9c] while moving to
second phase
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException
at
org.elasticsearch.common.util.concurrent.EsExecutors$TimedBlockingPolicy.rejectedExecution(EsExecutors.java:171)

    at 

java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)

    at 

java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)

    at 

org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.moveToSecondPhase(TransportSearchDfsQueryThenFetchAction.java:132)

    at 

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:228)

    at 

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onResult(TransportSearchTypeAction.java:207)

    at 

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onResult(TransportSearchTypeAction.java:204)

    at 

org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteDfs(SearchServiceTransportAction.java:107)

    at 

org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchDfsQueryThenFetchAction.java:86)

    at 

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:204)

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/concurrent-queries-EsRejectedExecutionException-and-low-performance-tp4020578.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Otis, thanks for your response.

I have found the bottleneck was the IO (iowait was very high) in my previous test.

I have still two probolems about ES:

  1. When I do the concurrent queries ( 10/100 threads), the cpu load of ES process is very high (790%). Is there any way to limit the cpu load of ES process.

  2. has_child query is very slow. I have 2M parent docs and 2M child docs, one parent has only one child. The reponse time for only parent doc query is less than 100ms, but When I add hasChildFilter to search condition, the reponse time is greater than 20s (every query with different condition is very slow). I don't know why.

curl -XPUT http://localhost:9200/test/parent/_mapping -d ' {
"summary" : {
"_source" : {"enabled" : false},
"_all" : {"enabled" : false},
"_routing" : { "required" : true, "path" : "owner_id" },
"properties" : {
"message_id" : {"type" : "string", "store" : "yes", "index":"not_analyzed"},
"owner_type" : {"type" : "string", "store" : "yes", "index":"not_analyzed"},
"owner_id" : {"type" : "string", "store" : "yes", "index":"not_analyzed"},
"box_id" : {"type" : "string", "store" : "yes", "index":"not_analyzed"},
"sender" : {"type" : "string", "store" : "yes", "index":"analyzed", "index_analyzer" : "emailaddr", "search_analyzer" : "default"},
"receiver" : {"type" : "string", "store" : "yes", "index":"analyzed", "index_analyzer" : "emailaddr", "search_analyzer" : "default"},
"has_attachement" : {"type" : "boolean", "store" : "yes"},
"size" : {"type" : "long", "store" : "yes"},
"status" : {"type" : "byte", "store" : "yes"}
}
}
}'

curl -XPUT localhost:9200/test/child/_mapping -d '
{
"activeinfo" : {
"_source" : {"enabled" : false},
"_all" : {"enabled" : false},
"_parent" : {"type": "summary"},
"properties" : {
"content" : {"type" : "string", "store" : "yes", "index":"analyzed", "index_analyzer" : "smartcn", "search_analyzer" : "smartcn"},
"html_content" : {"type" : "string", "store" : "yes", "index" : "no"},
"store_reference" : {"type" : "string", "store" : "yes", "index":"no"}
}
}
}'