CPU usages 90% and ES hotthreads dump


(sujan dutta) #1

Hi everybody

i have a ES cluster of 2 node (8 GB) machine.
each ES node assigned 4 GB memory. I have 1 index with 10 shard and 1
replica. this index has around 35 millions documents of 20 GB.
i have 2 more indexes, which have few hundreds of documents.

Suddenly in 1 node cpu usages is 90% and indexing/search operations
response time increased to 4-5 secs.

i have taken hotthread infomation, if anybody can analyse it and inform me
that i need to increase memory of the node or add other node horizontally.
Or any other things i need to rectify this problem.

thanks - sujan

hot threads:

:::
[p_gossamer_v2_data_master_client][5AkaMXO7ReWLiLHaOKqu7g][inet[/10.236.184.30:9300]]{transact=tag_transact_v01,
sandbox=tag_sandbox_v01, storetype=primary, max_local_storage_nodes=1,
master=true, metadata=tag_metadata_v01}

69.8% (348.7ms out of 500ms) cpu usage by thread
'elasticsearch[p_gossamer_v2_data_master_client][search][T#524]'
3/10 snapshots sharing following 10 elements

org.elasticsearch.search.SearchService.shortcutDocIdsToLoad(SearchService.java:588)

org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:325)

org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:243)

org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:75)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:205)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:178)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
7/10 snapshots sharing following 9 elements
sun.misc.Unsafe.park(Native Method)

java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)

java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)

java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)

java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)

60.6% (302.7ms out of 500ms) cpu usage by thread
'elasticsearch[p_gossamer_v2_data_master_client][search][T#520]'
3/10 snapshots sharing following 10 elements

org.elasticsearch.search.SearchService.shortcutDocIdsToLoad(SearchService.java:588)

org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:325)

org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:243)

org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:75)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:205)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:178)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
6/10 snapshots sharing following 9 elements
sun.misc.Unsafe.park(Native Method)

java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)

java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)

java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)

java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
unique snapshot

org.elasticsearch.search.internal.InternalSearchResponse.(InternalSearchResponse.java:51)

org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhaseController.java:354)

org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.moveToSecondPhase(TransportSearchQueryAndFetchAction.java:86)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:229)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onResult(TransportSearchTypeAction.java:208)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onResult(TransportSearchTypeAction.java:205)

org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:244)

org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:75)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:205)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:178)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)

5.5% (27.4ms out of 500ms) cpu usage by thread

'elasticsearch[p_gossamer_v2_data_master_client][index][T#3451]'
2/10 snapshots sharing following 3 elements

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
8/10 snapshots sharing following 9 elements
sun.misc.Unsafe.park(Native Method)

java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)

java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)

java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)

java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)

:::
[p_gossamer_v2_data_master_r1][_hotrVvST_CAwJw6f3wd-A][inet[/10.143.162.238:9300]]{transact=tag_transact_v01,
sandbox=tag_sandbox_v01, storetype=replica, max_local_storage_nodes=1,
master=true, metadata=tag_metadata_v01}

3.6% (17.9ms out of 500ms) cpu usage by thread

'elasticsearch[p_gossamer_v2_data_master_r1][index][T#4580]'
10/10 snapshots sharing following 9 elements
sun.misc.Unsafe.park(Native Method)

java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)

java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)

java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)

java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #2

Hey,

what kind of operation are you executing when this happens? Are you trying
to return a huge amount of documents in your search requests? Anything else
which might help as well?

--Alex

On Fri, Oct 18, 2013 at 5:45 AM, Sujan Dutta sujandutta@gmail.com wrote:

Hi everybody

i have a ES cluster of 2 node (8 GB) machine.
each ES node assigned 4 GB memory. I have 1 index with 10 shard and 1
replica. this index has around 35 millions documents of 20 GB.
i have 2 more indexes, which have few hundreds of documents.

Suddenly in 1 node cpu usages is 90% and indexing/search operations
response time increased to 4-5 secs.

i have taken hotthread infomation, if anybody can analyse it and inform me
that i need to increase memory of the node or add other node horizontally.
Or any other things i need to rectify this problem.

thanks - sujan

hot threads:

:::
[p_gossamer_v2_data_master_client][5AkaMXO7ReWLiLHaOKqu7g][inet[/10.236.184.30:9300]]{transact=tag_transact_v01,
sandbox=tag_sandbox_v01, storetype=primary, max_local_storage_nodes=1,
master=true, metadata=tag_metadata_v01}

69.8% (348.7ms out of 500ms) cpu usage by thread
'elasticsearch[p_gossamer_v2_data_master_client][search][T#524]'
3/10 snapshots sharing following 10 elements

org.elasticsearch.search.SearchService.shortcutDocIdsToLoad(SearchService.java:588)

org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:325)

org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:243)

org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:75)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:205)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:178)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
7/10 snapshots sharing following 9 elements
sun.misc.Unsafe.park(Native Method)

java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)

java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)

java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)

java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)

java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)

60.6% (302.7ms out of 500ms) cpu usage by thread
'elasticsearch[p_gossamer_v2_data_master_client][search][T#520]'
3/10 snapshots sharing following 10 elements

org.elasticsearch.search.SearchService.shortcutDocIdsToLoad(SearchService.java:588)

org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:325)

org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:243)

org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:75)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:205)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:178)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
6/10 snapshots sharing following 9 elements
sun.misc.Unsafe.park(Native Method)

java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)

java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)

java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)

java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)

java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
unique snapshot

org.elasticsearch.search.internal.InternalSearchResponse.(InternalSearchResponse.java:51)

org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhaseController.java:354)

org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.moveToSecondPhase(TransportSearchQueryAndFetchAction.java:86)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:229)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onResult(TransportSearchTypeAction.java:208)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onResult(TransportSearchTypeAction.java:205)

org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:244)

org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:75)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:205)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:178)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)

5.5% (27.4ms out of 500ms) cpu usage by thread

'elasticsearch[p_gossamer_v2_data_master_client][index][T#3451]'
2/10 snapshots sharing following 3 elements

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
8/10 snapshots sharing following 9 elements
sun.misc.Unsafe.park(Native Method)

java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)

java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)

java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)

java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)

java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)

:::
[p_gossamer_v2_data_master_r1][_hotrVvST_CAwJw6f3wd-A][inet[/10.143.162.238:9300]]{transact=tag_transact_v01,
sandbox=tag_sandbox_v01, storetype=replica, max_local_storage_nodes=1,
master=true, metadata=tag_metadata_v01}

3.6% (17.9ms out of 500ms) cpu usage by thread

'elasticsearch[p_gossamer_v2_data_master_r1][index][T#4580]'
10/10 snapshots sharing following 9 elements
sun.misc.Unsafe.park(Native Method)

java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)

java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)

java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)

java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)

java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3