Hi,
Recently, we've been getting high CPU usage due to (what seems to be) the build of global ordinals.
We already adjusted refresh_interval to 30s, which helped for a while, but CPU is high again, after some data was added.
Here is the hot_threads output from a specific node, which is problematic (Some parts were removed due to message size limit):
95.6% (478.1ms out of 500ms) cpu usage by thread 'elasticsearch[PROD-228-USW1-CL1-ES37][search][T#14]'
8/10 snapshots sharing following 45 elements
sun.nio.ch.NativeThread.current(Native Method)
sun.nio.ch.NativeThreadSet.add(NativeThreadSet.java:46)
sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:736)
sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:726)
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:140)
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:116)
org.apache.lucene.codecs.lucene410.Lucene410DocValuesProducer$CompressedBinaryDocValues$CompressedBinaryTermsEnum.readTerm(Lucene410DocValuesProducer.java:909)
org.apache.lucene.codecs.lucene410.Lucene410DocValuesProducer$CompressedBinaryDocValues$CompressedBinaryTermsEnum.next(Lucene410DocValuesProducer.java:925)
org.apache.lucene.index.MultiTermsEnum.pushTop(MultiTermsEnum.java:293)
org.apache.lucene.index.MultiTermsEnum.next(MultiTermsEnum.java:319)
org.apache.lucene.index.MultiDocValues$OrdinalMap.(MultiDocValues.java:525)
org.apache.lucene.index.MultiDocValues$OrdinalMap.build(MultiDocValues.java:482)
org.apache.lucene.index.MultiDocValues$OrdinalMap.build(MultiDocValues.java:461)
org.elasticsearch.index.fielddata.ordinals.GlobalOrdinalsBuilder.build(GlobalOrdinalsBuilder.java:55)
org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.localGlobalDirect(SortedSetDVOrdinalsIndexFieldData.java:81)
org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.localGlobalDirect(SortedSetDVOrdinalsIndexFieldData.java:35)
org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$2.call(IndicesFieldDataCache.java:211)
org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$2.call(IndicesFieldDataCache.java:199)
org.elasticsearch.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742)
org.elasticsearch.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
org.elasticsearch.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
org.elasticsearch.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
org.elasticsearch.common.cache.LocalCache.get(LocalCache.java:3937)
org.elasticsearch.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.load(IndicesFieldDataCache.java:199)
org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData.loadGlobal(SortedSetDVOrdinalsIndexFieldData.java:69)
org.elasticsearch.search.aggregations.support.ValuesSource$Bytes$WithOrdinals$FieldData.globalMaxOrd(ValuesSource.java:285)
org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory.create(TermsAggregatorFactory.java:204)
org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.create(ValuesSourceAggregatorFactory.java:54)
org.elasticsearch.search.aggregations.AggregatorFactories.createAndRegisterContextAware(AggregatorFactories.java:53)
org.elasticsearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:157)
org.elasticsearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:79)
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:100)
org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:301)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:312)
We're using Elasticsearch 1.7.5.
My questions are:
- Is there any way to determine what's the aggregation that's triggering this build?
- Any way to optimize the build time? I thought about eager loading of these values, but I'm not sure if it would help, since my problem is not the time that it takes the query, but the fact that CPU is constantly high, causing performance issues - so the build will still occur every refresh causing high CPU.
Thanks.