Hi,
We have our production cluster at 100% with load average of 100.
- JDK 1.7u55_64bits
- ELS 1.7.5
- 9 nodes 30GB heap
- centos 6.5-6.6
in ELS : 1700 indices / 3200 shards / 5B docs / 5 TB
For the requests : Only Kibana 3 with ~50 dashboards (so no aggragetion, only facets)
We have 5-10 request by second
10 000 indexations/second with average document size <1k
The global cpu load is 5% in busy time
In fact the search queue is exhausted and the active search pool is full.
The curent requests never finish.
The indexing still works fine.
We must restart the cluster to resolve the problem.
The problem is now random, everything worked fine for months and now it crashed everyday.
- Can we put a global request timeout ?
- Anybody already had this problem ?
- what can we do apart restarting the cluster ?
Here an extract of the hot_threads
54.2% (270.7ms out of 500ms) cpu usage by thread 'elasticsearch[server1][search][T#14]'
10/10 snapshots sharing following 11 elements
org.elasticsearch.common.xcontent.json.JsonXContentParser.nextToken(JsonXContentParser.java:51)
org.elasticsearch.index.query.IndexQueryParserService.parseQuery(IndexQueryParserService.java:350)
org.elasticsearch.action.count.TransportCountAction.shardOperation(TransportCountAction.java:187)
org.elasticsearch.action.count.TransportCountAction.shardOperation(TransportCountAction.java:66)
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:338)
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:324)
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
50.9% (254.2ms out of 500ms) cpu usage by thread 'elasticsearch[server1][search][T#6]'
10/10 snapshots sharing following 11 elements
org.elasticsearch.common.xcontent.json.JsonXContentParser.nextToken(JsonXContentParser.java:51)
org.elasticsearch.index.query.IndexQueryParserService.parseQuery(IndexQueryParserService.java:350)
org.elasticsearch.action.count.TransportCountAction.shardOperation(TransportCountAction.java:187)
org.elasticsearch.action.count.TransportCountAction.shardOperation(TransportCountAction.java:66)
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:338)
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:324)
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
50.6% (252.8ms out of 500ms) cpu usage by thread 'elasticsearch[server1][search][T#21]'
10/10 snapshots sharing following 13 elements
org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd(UTF8StreamJsonParser.java:2728)
org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:652)
org.elasticsearch.common.xcontent.json.JsonXContentParser.nextToken(JsonXContentParser.java:51)
org.elasticsearch.index.query.IndexQueryParserService.parseQuery(IndexQueryParserService.java:350)
org.elasticsearch.action.count.TransportCountAction.shardOperation(TransportCountAction.java:187)
org.elasticsearch.action.count.TransportCountAction.shardOperation(TransportCountAction.java:66)
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:338)
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:324)
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
thx for any help