Elasticsearch high load average/cpu usage

i am a developer from CHINA, i was suffering from load average recently,which range is 2-5 .I have 5 clustered nodes and each nodes have 1 replica,the cluster total document size is 2G and 2,000,000 docs.

Here are some relevant information:

1.load average

top - 15:34:14 up 9 days, 23:09, 1 user, load average: 2.18, 2.30, 2.39
Tasks: 169 total, 1 running, 168 sleeping, 0 stopped, 0 zombie
Cpu0 : 51.8%us, 0.7%sy, 0.0%ni, 47.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 46.3%us, 0.3%sy, 0.0%ni, 53.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 37.0%us, 0.3%sy, 0.0%ni, 62.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 35.6%us, 0.3%sy, 0.0%ni, 63.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu4 : 47.5%us, 0.7%sy, 0.0%ni, 51.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 33.3%us, 0.0%sy, 0.0%ni, 66.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 25.4%us, 0.3%sy, 0.0%ni, 74.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 12.7%us, 0.3%sy, 0.0%ni, 87.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16637 root 20 0 9297m 3.6g 31m S 209.4 62.8 8964:38 java
1 root 20 0 19232 1012 840 S 0.0 0.0 0:38.37 init

2.bin/elasticsearch -v

Version: 1.3.2, Build: dee175d/2014-08-13T14:29:30Z, JVM: 1.7.0_55

3.java -version

java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)

4、elasticsearch.yml

cluster.name: XXXXXX
node.name: 035

index.cache.field.max_size: 500000
index.cache.field.expire: 5m
index:
analysis:
analyzer:
index_ansj:
alias: [ansj_index_analyzer]
type: ansj_index
user_path: ansj/user
ambiguity: ansj/ambiguity.dic
stop_path: ansj/stopLibrary.dic
is_name: false
redis:
pool:
maxactive: 20
maxidle: 10
maxwait: 100
testonborrow: true
ip: 192.168.0.159:6379
channel: ansj_term
query_ansj:
alias: [ansj_index_analyzer]
type: ansj_query
user_path: ansj/user
ambiguity: ansj/ambiguity.dic
stop_path: ansj/stopLibrary.dic
is_name: false
redis:
pool:
maxactive: 20
maxidle: 10
maxwait: 100
testonborrow: true
ip: 192.168.0.159:6379
channel: ansj_term
index.analysis.analyzer.default.type: keyword
################################## Slow Log ##################################

index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s

index.search.slowlog.threshold.query.debug: 2s

index.search.slowlog.threshold.query.trace: 500ms

index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug: 500ms
index.search.slowlog.threshold.fetch.trace: 200ms

index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s
index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms

################################## GC Logging ################################
monitor.jvm.gc.young.warn: 1000ms
monitor.jvm.gc.young.info: 700ms
monitor.jvm.gc.young.debug: 400ms

monitor.jvm.gc.old.warn: 10s
monitor.jvm.gc.old.info: 5s
monitor.jvm.gc.old.debug: 2s

threadpool:
index:
type: fixed
size: 30
queue_size: -1
search:
type: fixed
size: 30
queue_size: 1000

5.curl -XGET 'localhost:9200/_nodes/hot_threads'

::: [180][VNscyuhPS3u94QuyI2TfPQ][es180][inet[/192.168.0.180:9300]]

96.9% (484.5ms out of 500ms) cpu usage by thread 'elasticsearch[180][search][T#23]'
2/10 snapshots sharing following 29 elements
org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:53)
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
Script1.run(Script1.groovy:1)
org.elasticsearch.script.groovy.GroovyScriptEngineService$GroovyScript.run(GroovyScriptEngineService.java:252) org.elasticsearch.script.groovy.GroovyScriptEngineService$GroovyScript.runAsDouble(GroovyScriptEngineService.java:273) org.elasticsearch.common.lucene.search.function.ScriptScoreFunction.score(ScriptScoreFunction.java:54)
org.elasticsearch.common.lucene.search.function.FunctionScoreQuery$CustomBoostFactorScorer.score(FunctionScoreQuery.java:175)
org.apache.lucene.search.FilteredQuery$LeapFrogScorer.score(FilteredQuery.java:308)
org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:49)
org.apache.lucene.search.FieldComparator$RelevanceComparator.compareBottom(FieldComparator.java:774)
org.apache.lucene.search.TopFieldCollector$OutOfOrderMultiComparatorNonScoringCollector.collect(TopFieldCollector.java:484)
org.elasticsearch.common.lucene.search.FilteredCollector.collect(FilteredCollector.java:61)
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:193)
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:175)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:581)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:533)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:510)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:345)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:149)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:261)
org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206)
org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203)
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
4/10 snapshots sharing following 16 elements
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:175)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:581)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:533)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:510)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:345)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:149)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:261)
org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206)
org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203)
org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
4/10 snapshots sharing following 2 elements
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
92.9% (464.4ms out of 500ms) cpu usage by thread 'elasticsearch[180][search][T#18]'
10/10 snapshots sharing following 10 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:735)
java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:644)
java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1137)
org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
67.6% (337.9ms out of 500ms) cpu usage by thread 'elasticsearch[180][search][T#8]'
7/10 snapshots sharing following 28 elements
org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:53)
org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
Script9.run(Script9.groovy:1)
org.elasticsearch.script.groovy.GroovyScriptEngineService$GroovyScript.run(GroovyScriptEngineService.java:252)
org.elasticsearch.script.groovy.GroovyScriptEngineService$GroovyScript.runAsDouble(GroovyScriptEngineService.java:273)
org.elasticsearch.common.lucene.search.function.ScriptScoreFunction.score(ScriptScoreFunction.java:54)
org.elasticsearch.common.lucene.search.function.FunctionScoreQuery$CustomBoostFactorScorer.score(FunctionScoreQuery.java:175)
org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:49)
org.apache.lucene.search.FieldComparator$RelevanceComparator.compareBottom(FieldComparator.java:774)
org.apache.lucene.search.TopFieldCollector$OutOfOrderMultiComparatorNonScoringCollector.collect(TopFieldCollector.java:484)
org.elasticsearch.common.lucene.search.FilteredCollector.collect(FilteredCollector.java:61)
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:193)
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:175)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:581)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:533)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:510)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:345)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:149)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:261)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:688)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:677)
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
3/10 snapshots sharing following 16 elements
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:175)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:581)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:533)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:510)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:345)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:149)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:261)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:688)
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:677)
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)