Concerned about high query times

Hi there,

We recently decided use Elasticsearch for our website. Rewrote the website, including mobile apps in a short time. So the time we publish the website and apps , something went wrong in our elastic nodes. We have nearly 250ms query time and we are concerned about it. By the way, could you help us to find the root cause and optimize queries and any suggestion to use elasticsearch better. We have currently 4000 concurrent users, and about 2700 pageviews.

So here are configuration to give you more info about what we have ;
4 nodes
Each node have 32 GB RAM
Xeon 2.10 ghz proccessors

Query times ;

Last snaphsot of hot threads ;
{node-2}
Hot threads at 2016-07-01T09:26:08.643Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

31.3% (156.2ms out of 500ms) cpu usage by thread 'elasticsearch[node-2][search][T#11]'
2/10 snapshots sharing following 21 elements
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:80)
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucket(BucketsAggregator.java:72)
org.elasticsearch.search.aggregations.bucket.global.GlobalAggregator$1.collect(GlobalAggregator.java:54)
org.elasticsearch.search.aggregations.LeafBucketCollector$3.collect(LeafBucketCollector.java:73)
org.elasticsearch.search.aggregations.LeafBucketCollector.collect(LeafBucketCollector.java:88)

131.3% (656.2ms out of 500ms) cpu usage by thread 'elasticsearch[node-4][search][T#10]'
2/10 snapshots sharing following 19 elements
org.elasticsearch.search.aggregations.bucket.terms.InternalTerms.doReduce(InternalTerms.java:212)
org.elasticsearch.search.aggregations.InternalAggregation.reduce(InternalAggregation.java:153)
org.elasticsearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:170)
org.elasticsearch.search.aggregations.bucket.terms.InternalTerms$Bucket.reduce(InternalTerms.java:110)
org.elasticsearch.search.aggregations.bucket.terms.InternalTerms.doReduce(InternalTerms.java:220)
org.elasticsearch.search.aggregations.InternalAggregation.reduce(InternalAggregation.java:153)
org.elasticsearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:170)

125.0% (625ms out of 500ms) cpu usage by thread 'elasticsearch[node-4][search][T#19]'
5/10 snapshots sharing following 26 elements
org.elasticsearch.search.aggregations.AggregatorFactory$1$1.collect(AggregatorFactory.java:208) org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:80)
org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucket(BucketsAggregator.java:72)
org.elasticsearch.search.aggregations.bucket.nested.NestedAggregator$1.collect(NestedAggregator.java:112)
org.elasticsearch.search.aggregations.AggregatorFactory$1$1.collect(AggregatorFactory.java:208)

37.5% (187.5ms out of 500ms) cpu usage by thread 'elasticsearch[node-3][search][T#6]'
2/10 snapshots sharing following 42 elements
java.lang.Thread.isAlive(Native Method)
org.apache.lucene.util.CloseableThreadLocal.purge(CloseableThreadLocal.java:115)
org.apache.lucene.util.CloseableThreadLocal.maybePurge(CloseableThreadLocal.java:105)
org.apache.lucene.util.CloseableThreadLocal.get(CloseableThreadLocal.java:88)
org.apache.lucene.index.CodecReader.getSortedSetDocValues(CodecReader.java:243)
org.apache.lucene.index.FilterLeafReader.getSortedSetDocValues(FilterLeafReader.java:454)
org.apache.lucene.index.DocValues.getSortedSet(DocValues.java:302)

Hey,

the hot threads output shows a lot of CPU spent in aggs (or operations needed for aggregations). If you dont run any aggregations, is your query fast? You can then try to add the aggregations (each by each) back and see which one is a potential performance culprit. Maybe there is a single aggregation that creates a lot of buckets or needs to parse all of your data in your dataset (and you can change your data model execute that aggregation in a more efficient manner). Maybe it is all of your aggregations together taking a lot of time. Dissecting this a first step might help.

--Alex

Yes, it was exactly what you said. We worked on aggregations which has some sub aggregation, also tried to reduce aggregation count.

First we get these query times.

And then we decided to remove global aggregations which really affect query times, however we experienced that. Here now we have