The search speed seems much slower than that in 1.3

Hello,
After i upgrade my cluster to 5.0 from 1.3, the search speed becomes much slower in my index.
Currently, i have two clusters with same data, one 1.3 and one 5.0. Each cluster have 5 nodes.

My index have about 100 million documents.
When i trying search in 5.0 cluster, the speed is about 5 times slower than that in 1.0.

Does anyone know what's the problem?

Thanks

Can you provide more details? How are you indexing your documents, what specific searches are slower?

Mike McCandless

Just a termquery over an analyzed field.
The search speed of the first try is extremely slow.
Sometimes it costs more than 10 seconds.
And if i try again and again, it becomes faster and faster...

I guess maybe it's about the cache?

The main different between the two clusters is:
In the new cluster, i separated my index into two parts.
One is recent data, about 20 percents of the whole index.
The other is the entire index.

Most search requests are passed to the recent index now. And these requests works well.
And if i search the entire index, the speed becomes slower.

My machine have 128MB memory, half for elastic and half for linux file cache.
A crawler writes about 5 documents per second to the recent index and also copy them to the entire index.

I started 7 nodes for the 5.0 cluster and 3 nodes for 1.3 cluster today.
The most remarkable phenomenon is that:

I have a service which counts "document frequency" for other applications.
Averagely, this service needs to send 3000 count requests to the server every minutes.
In the 1.3 cluster, my service works well, the load average of the master node is about 60%.
While in the 5.0 cluster, my service can only send 200 request per minute.
And even in this case, the load average of three nodes is above 60%.
What's more, in 5.0 the count request even cannot finish in 30 seconds.

Since 5.0 do not support count request, i count document frequency as follows:

client.prepareSearch(index).setTypes(type).setQuery(query).setSize(0).get().getHits().getTotalHits();

I conduct a further experiment to find the difference between the two clusters.
And an interesting finding is that, in es 5.0, the maximum level of load average of the busiest machine is 35%.
While in 1.3, it can reach 80%.

I think maybe this is the reason es 5.0 is slower than 1.3.
But i cannot understand what's the problem?
Does es 5.0 limited the cpu usage to make sure the machine is in a well status?
Or should i increase the number of threadpool for search or something like this?

I will check the documents right now.
Looking forward to your response.
Thank you.

Did you actually mean MB, or rather GB?

No it does not.

Out of curiosity, do you have only one type or multiple types in your index? Also do you sometimes delete or update documents?

Since CPU does not seem to be maxed out when running under high load, I'm wondering that I/O could be the bottleneck, could you check? Another possibility would be garbage collection: can you compare the garbage collection load in both clusters?

Finally could you try to get nodes hot threads while running a load test on the 5.0 cluster? If you could call this API 2 or 3 times while the load test is running and share the output, that would be great. Nodes hot threads API | Elasticsearch Guide [8.11] | Elastic

Yes, It's GB.

Only one type in this index. (In 1.3,there is another type with no documents, i think it does not affect the result.)
Averagely, i need to update each document more than once. Since some data is generated after the document being added to the index. It's a design problem, and currently we haven't solve this problem.
The update request is send to both 1.3 and 5.0 by a rest client.

The CPU usage is low in both clusters. About 200% over the maximum 2400%. I'm also wondering about I/O. But i cannot understand what's the reason cause the difference.
As mentioned in my previous post, i separated my index to two parts, in order to accelerate recency search. Is it possible that most memory are used for caching data in the recency index, and while searching in the entire index, there is not sufficient memory?
If this is the problem (or can be the problem), may be i should config the shard allocation settings to make sure each node holds only one index.

About the garbage collection, could you tell me how to get the status of garbage collection.

The hot threads are too long for a post, I will show them in the following posts.
It seems most threads in 5.0 is stepping for the next matched document.

BTW:
I found an interesting phenomenon in both 1.3 and 5.0. When the load test is running, only one node in the cluster is busy. The load average of that node reaches the maximum, while the other nodes is less than 10. The busiest node can be any node in the cluster not only master. And it changes only when a node is left or a new node is joined to the cluster.

I modified the thread-pool.search.size to 100(since 1.3 is 3X and now it is 3/2X), and conduct another experiment to see if the thread-pool size affects the result.
The result shows that 5.0 is still slower than 1.3. The average cost is about 3 times that of 1.3.

The first time of 5.0:

::: {index185}
   Hot threads at 2016-12-21T02:44:04.605Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {index172}
   Hot threads at 2016-12-21T02:44:04.604Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {index188}
   Hot threads at 2016-12-21T02:44:04.605Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   20.7% (103.6ms out of 500ms) cpu usage by thread 'elasticsearch[index188][refresh][T#3]'
	 2/10 snapshots sharing following 23 elements
	   o.a.l.i.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:443)
	   o.a.l.i.DocumentsWriter.doFlush(DocumentsWriter.java:539)
	   o.a.l.i.DocumentsWriter.flushAllThreads(DocumentsWriter.java:653)
	   o.a.l.i.IndexWriter.getReader(IndexWriter.java:438)
	 3/10 snapshots sharing following 27 elements
	 ...
	   org.elasticsearch.index.IndexWarmer.warm(IndexWarmer.java:85)
	 ...
	   o.a.l.s.SearcherManager.getSearcher(SearcherManager.java:198)
   
   18.7% (93.4ms out of 500ms) cpu usage by thread 'elasticsearch[index188][search][T#65]'
	 2/10 snapshots sharing following 23 elements
	   o.a.l.s.IndexSearcher.search(IndexSearcher.java:473)
	   o.e.s.query.QueryPhase.execute(QueryPhase.java:370)
	   o.e.s.query.QueryPhase.execute(QueryPhase.java:106)
	 2/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
	   java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

::: {index180}
   Hot threads at 2016-12-21T02:44:05.368Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   63.2% (315.8ms out of 500ms) cpu usage by thread 'elasticsearch[index180][search][T#56]'
	 2/10 snapshots sharing following 30 elements
	   o.a.l.c.lucene50.Lucene50PostingsReader$BlockDocsEnum.refillDocs(Lucene50PostingsReader.java:371)
	   o.a.l.c.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc(Lucene50PostingsReader.java:382)
	   o.a.l.s.ConjunctionDISI.nextDoc(ConjunctionDISI.java:240)
	 2/10 snapshots sharing following 24 elements
	   o.a.l.s.IndexSearcher.search(IndexSearcher.java:666)
	   o.a.l.s.IndexSearcher.search(IndexSearcher.java:473)
	 2/10 snapshots sharing following 20 elements
	   o.a.l.s.Weight$DefaultBulkScorer.scoreAll(Weight.java:221)
	   o.a.l.s.Weight$DefaultBulkScorer.score(Weight.java:172)
	 2/10 snapshots sharing following 20 elements
	   o.a.l.s.Weight$DefaultBulkScorer.scoreAll(Weight.java:219)
	   o.a.l.s.Weight$DefaultBulkScorer.score(Weight.java:172)
	   o.a.l.s.ConstantScoreQuery$ConstantBulkScorer.score(ConstantScoreQuery.java:84)
	 2/10 snapshots sharing following 15 elements
	   o.a.l.s.IndexSearcher.search(IndexSearcher.java:473)
   
   47.6% (237.8ms out of 500ms) cpu usage by thread 'elasticsearch[index180][search][T#26]'
	 3/10 snapshots sharing following 30 elements
	   o.a.l.c.lucene50.Lucene50PostingsReader$BlockDocsEnum.refillDocs(Lucene50PostingsReader.java:371)
	   o.a.l.c.lucene50.Lucene50PostingsReader$BlockDocsEnum.nextDoc(Lucene50PostingsReader.java:382)
	   o.a.l.s.ConjunctionDISI.nextDoc(ConjunctionDISI.java:240)
	 3/10 snapshots sharing following 26 elements
	   org.elasticsearch.common.lucene.ShardCoreKeyMap.add(ShardCoreKeyMap.java:68)
	   org.elasticsearch.indices.IndicesQueryCache$CachingWeightWrapper.bulkScorer(IndicesQueryCache.java:160)
	   o.a.l.s.IndexSearcher.search(IndexSearcher.java:666)
	 3/10 snapshots sharing following 20 elements
	   o.a.l.s.Weight$DefaultBulkScorer.scoreAll(Weight.java:221)
	   o.a.l.s.Weight$DefaultBulkScorer.score(Weight.java:172)
	 unique snapshot
	 ...
	   java.lang.Class.getSimpleName(Class.java:1309)
	   o.a.l.s.UsageTrackingQueryCachingPolicy.isPointQuery(UsageTrackingQueryCachingPolicy.java:39)
   
   46.5% (232.7ms out of 500ms) cpu usage by thread 'elasticsearch[index180][search][T#20]'
	 4/10 snapshots sharing following 24 elements
	   o.a.l.s.IndexSearcher.search(IndexSearcher.java:666)
	   o.a.l.s.IndexSearcher.search(IndexSearcher.java:473)
	 6/10 snapshots sharing following 11 elements
	   o.e.s.SearchService.executeQueryPhase(SearchService.java:251)
	   org.elasticsearch.action.search.SearchTransportService.lambda$registerRequestHandler$6(SearchTransportService.java:276)

::: {index187}
   Hot threads at 2016-12-21T02:44:04.604Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {index184}
   Hot threads at 2016-12-21T02:44:04.606Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   40.3% (201.4ms out of 500ms) cpu usage by thread 'elasticsearch[index184][[tj_news_recent][0]: Lucene Merge Thread #4495]'
	 5/10 snapshots sharing following 12 elements
	   o.a.l.c.compressing.LZ4.compress(LZ4.java:243)
	   o.a.l.c.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:164)
	 2/10 snapshots sharing following 29 elements
	   sun.nio.ch.FileDispatcherImpl.write0(Native Method)
	   sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
	 2/10 snapshots sharing following 12 elements
	   o.a.l.c.compressing.LZ4.compress(LZ4.java:243)
	   o.a.l.c.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:164)
   
   19.2% (96ms out of 500ms) cpu usage by thread 'elasticsearch[index184][refresh][T#1]'
	 8/10 snapshots sharing following 27 elements
	 ...
	   org.elasticsearch.index.IndexWarmer.warm(IndexWarmer.java:85)
	   org.elasticsearch.index.IndexService.lambda$createShard$0(IndexService.java:334)
	 2/10 snapshots sharing following 9 elements
	   sun.misc.Unsafe.park(Native Method)
	   java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

::: {index181}
   Hot threads at 2016-12-21T02:44:04.605Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

Some trace information and unnecessary information are deleted for the length problem

The second Time of 5.0:

::: {index185}
   Hot threads at 2016-12-21T02:44:28.367Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {index172}
   Hot threads at 2016-12-21T02:44:28.365Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {index188}
   Hot threads at 2016-12-21T02:44:28.366Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {index180}
   Hot threads at 2016-12-21T02:44:28.434Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   45.5% (227.2ms out of 500ms) cpu usage by thread 'elasticsearch[index180][search][T#26]'
	 2/10 snapshots sharing following 24 elements
	   o.a.l.s.IndexSearcher.search(IndexSearcher.java:666)
	   o.a.l.s.IndexSearcher.search(IndexSearcher.java:473)
	 2/10 snapshots sharing following 16 elements
	   o.a.l.i.CodecReader.document(CodecReader.java:88)
	   o.a.l.i.FilterLeafReader.document(FilterLeafReader.java:411)
	   o.e.s.fetch.FetchPhase.loadStoredFields(FetchPhase.java:347)
	 4/10 snapshots sharing following 19 elements
	   o.e.c.l.ShardCoreKeyMap.add(ShardCoreKeyMap.java:68)
	   o.e.i.IndicesQueryCache$CachingWeightWrapper.scorer(IndicesQueryCache.java:154)
	   o.e.c.l.search.FilteredCollector.getLeafCollector(FilteredCollector.java:47)
	   o.a.l.s.IndexSearcher.search(IndexSearcher.java:660)
	   o.a.l.s.IndexSearcher.search(IndexSearcher.java:473)
	 2/10 snapshots sharing following 19 elements
	   o.a.l.s.Weight$DefaultBulkScorer.score(Weight.java:172)
	   o.a.l.s.ConstantScoreQuery$ConstantBulkScorer.score(ConstantScoreQuery.java:84)
	   o.a.l.s.BulkScorer.score(BulkScorer.java:39)
   
   34.9% (174.5ms out of 500ms) cpu usage by thread 'elasticsearch[index180][search][T#58]'
	 7/10 snapshots sharing following 30 elements
	   o.a.l.c.l.Lucene50PostingsReader$BlockDocsEnum.refillDocs(Lucene50PostingsReader.java:371)
	   o.a.l.c.l.Lucene50PostingsReader$BlockDocsEnum.nextDoc(Lucene50PostingsReader.java:382)
	   o.a.l.s.ConjunctionDISI.nextDoc(ConjunctionDISI.java:240)
	 3/10 snapshots sharing following 26 elements
	   o.e.c.l.ShardCoreKeyMap.add(ShardCoreKeyMap.java:68)
	   o.e.i.IndicesQueryCache$CachingWeightWrapper.bulkScorer(IndicesQueryCache.java:160)
	   o.a.l.s.IndexSearcher.search(IndexSearcher.java:666)
   
   31.3% (156.5ms out of 500ms) cpu usage by thread 'elasticsearch[index180][search][T#2]'
	 8/10 snapshots sharing following 15 elements
	   o.a.l.s.IndexSearcher.search(IndexSearcher.java:473)
	   o.e.s.query.QueryPhase.execute(QueryPhase.java:370)
	 2/10 snapshots sharing following 10 elements
	   o.e.a.s.SearchTransportService.lambda$registerRequestHandler$6(SearchTransportService.java:276)


::: {index187}{cTFX3U6zTvyMDpRoTJ9-dw}{UgaBh4I8TsuU_ztao8s-dg}{192.168.2.187}{192.168.2.187:9301}
   Hot threads at 2016-12-21T02:44:28.366Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {index184}{sj4mY1c4Q1yuuc8poW6sfg}{x-s8Us3aTtCwVTGYgxDQwA}{192.168.2.184}{192.168.2.184:9301}
   Hot threads at 2016-12-21T02:44:28.367Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {index181}{0sL-H1SXRLS3HabbY0nsQw}{irbmTfCTT1W7v1V2m8gPgQ}{192.168.2.181}{192.168.2.181:9301}
   Hot threads at 2016-12-21T02:44:28.367Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

The third time of 5.0:

::: {index185}{HRFskHirR-OAiTitb6dZ_A}{QWcZM7jYRyO2-MPAGxDl7g}{192.168.2.185}{192.168.2.185:9301}
   Hot threads at 2016-12-21T02:45:57.172Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   22.9% (114.3ms out of 500ms) cpu usage by thread 'elasticsearch[index185][search][T#94]'
	 2/10 snapshots sharing following 15 elements
	   o.a.l.s.IndexSearcher.search(IndexSearcher.java:473)
	   o.e.s.query.QueryPhase.execute(QueryPhase.java:370)
	 8/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
	   java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

::: {index172}{WFDc25aaTqS3pVzcRRyKFQ}{Pjki6K0STpSXXamFuJumog}{192.168.2.172}{192.168.2.172:9301}
   Hot threads at 2016-12-21T02:45:57.168Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   20.5% (102.6ms out of 500ms) cpu usage by thread 'elasticsearch[index172][refresh][T#3]'
	 10/10 snapshots sharing following 2 elements
	   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	   java.lang.Thread.run(Thread.java:745)

::: {index188}{5at14jo6Qn-GVe4nGw_jsw}{16FALwcORwSyoM3uT1McfA}{192.168.2.188}{192.168.2.188:9301}
   Hot threads at 2016-12-21T02:45:57.171Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {index180}{uCcqO0cIT-KNb8enzdh5Fg}{Kz4kTp_IRX2Cgh2ZutoLOg}{192.168.2.180}{192.168.2.180:9301}
   Hot threads at 2016-12-21T02:45:57.451Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   50.3% (251.7ms out of 500ms) cpu usage by thread 'elasticsearch[index180][search][T#45]'
	 3/10 snapshots sharing following 21 elements
	   o.a.l.s.DisjunctionDISIApproximation.nextDoc(DisjunctionDISIApproximation.java:55)
	   o.a.l.s.ConjunctionDISI$BitSetConjunctionDISI.nextDoc(ConjunctionDISI.java:279)
	   o.a.l.s.Weight$DefaultBulkScorer.scoreAll(Weight.java:219)
	 3/10 snapshots sharing following 20 elements
	   o.a.l.s.ConjunctionDISI$BitSetConjunctionDISI.nextDoc(ConjunctionDISI.java:279)
	   o.a.l.s.Weight$DefaultBulkScorer.scoreAll(Weight.java:219)
	   o.a.l.s.Weight$DefaultBulkScorer.score(Weight.java:172)
	 2/10 snapshots sharing following 21 elements
	   o.e.c.l.Lucene$3.get(Lucene.java:769)
	   o.e.c.l.search.FilteredCollector$1.collect(FilteredCollector.java:54)
	   o.a.l.s.Weight$DefaultBulkScorer.scoreAll(Weight.java:221)
	   o.a.l.s.Weight$DefaultBulkScorer.score(Weight.java:172)
	 2/10 snapshots sharing following 19 elements
	   o.a.l.s.Weight$DefaultBulkScorer.scoreAll(Weight.java:219)
	   o.a.l.s.Weight$DefaultBulkScorer.score(Weight.java:172)
   
   38.2% (191.2ms out of 500ms) cpu usage by thread 'elasticsearch[index180][search][T#18]'
	 4/10 snapshots sharing following 24 elements
	   o.a.l.s.DisjunctionSumScorer.score(DisjunctionSumScorer.java:42)
	   o.a.l.s.DisjunctionScorer.score(DisjunctionScorer.java:190)
	 2/10 snapshots sharing following 21 elements
	   o.e.c.l.Lucene$3.get(Lucene.java:769)
	   o.e.c.l.search.FilteredCollector$1.collect(FilteredCollector.java:54)
	   o.a.l.s.Weight$DefaultBulkScorer.scoreAll(Weight.java:221)

::: {index187}{cTFX3U6zTvyMDpRoTJ9-dw}{UgaBh4I8TsuU_ztao8s-dg}{192.168.2.187}{192.168.2.187:9301}
   Hot threads at 2016-12-21T02:45:57.169Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {index184}{sj4mY1c4Q1yuuc8poW6sfg}{x-s8Us3aTtCwVTGYgxDQwA}{192.168.2.184}{192.168.2.184:9301}
   Hot threads at 2016-12-21T02:45:57.173Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {index181}{0sL-H1SXRLS3HabbY0nsQw}{irbmTfCTT1W7v1V2m8gPgQ}{192.168.2.181}{192.168.2.181:9301}
   Hot threads at 2016-12-21T02:45:57.172Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   26.8% (134.1ms out of 500ms) cpu usage by thread 'elasticsearch[index181][search][T#29]'
	 3/10 snapshots sharing following 22 elements
	   o.a.l.s.DisjunctionDISIApproximation.nextDoc(DisjunctionDISIApproximation.java:55)
	   o.a.l.s.DisjunctionDISIApproximation.nextDoc(DisjunctionDISIApproximation.java:55)
	 2/10 snapshots sharing following 23 elements
	   o.a.l.s.DisjunctionScorer.score(DisjunctionScorer.java:190)
	   o.a.l.s.ConjunctionScorer.score(ConjunctionScorer.java:64)
	   o.a.l.s.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:64)
	 2/10 snapshots sharing following 20 elements
	   o.a.l.s.ConjunctionDISI$BitSetConjunctionDISI.nextDoc(ConjunctionDISI.java:279)
	   o.a.l.s.Weight$DefaultBulkScorer.scoreAll(Weight.java:219)
	 3/10 snapshots sharing following 18 elements
	   o.a.l.s.Weight$DefaultBulkScorer.score(Weight.java:172)
	   o.a.l.s.BulkScorer.score(BulkScorer.java:39)
	   o.a.l.s.IndexSearcher.search(IndexSearcher.java:669)

The first time of 1.3:

::: [index110][D-_h4nlnT0KcpBTG-Zjq5Q][TJ-index9][inet[/192.168.2.110:9300]]
   
   22.0% (109.9ms out of 500ms) cpu usage by thread 'elasticsearch[index110][search][T#12]'
	 10/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
	   java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	   java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:737)
	   java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:647)
	   java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1265)
	   org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
	   java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
	   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	   java.lang.Thread.run(Thread.java:745)
   
   16.7% (83.5ms out of 500ms) cpu usage by thread 'elasticsearch[index110][search][T#27]'
	 10/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
   
   16.4% (82ms out of 500ms) cpu usage by thread 'elasticsearch[index110][search][T#14]'
	 10/10 snapshots sharing following 2 elements
	   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor

::: [index111][NPhSc3sjRDucCEFDmn_LRA][TJ-index111][inet[/192.168.2.111:9300]]
   
   20.0% (100.2ms out of 500ms) cpu usage by thread 'elasticsearch[index111][search][T#25]'
	 10/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
   
   19.8% (98.8ms out of 500ms) cpu usage by thread 'elasticsearch[index111][search][T#20]'
	 10/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
   
   19.5% (97.7ms out of 500ms) cpu usage by thread 'elasticsearch[index111][search][T#34]'
	 10/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)

::: [index169][HuteZlkFTe6tnCfLmizIGQ][TJ-index7][inet[/192.168.2.169:9300]]
   
   23.3% (116.3ms out of 500ms) cpu usage by thread 'elasticsearch[index169][search][T#43]'
	 2/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
	 2/10 snapshots sharing following 13 elements
	 ...
	   org.elasticsearch.index.query.QueryParseContext.removeTypes(QueryParseContext.java:75)
	   org.elasticsearch.action.count.TransportCountAction.shardOperation(TransportCountAction.java:185)
	 4/10 snapshots sharing following 17 elements
	   o.a.l.s.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:60)
	   o.a.l.s.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:532)
	 2/10 snapshots sharing following 31 elements
	   sun.nio.ch.NativeThread.current(Native Method)
	   sun.nio.ch.NativeThreadSet.add(NativeThreadSet.java:46)
	   sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:704)
	   sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:694)
	   org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
	  ...
	   org.apache.lucene.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:125)
	   org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.advance(Lucene41PostingsReader.java:482)
	   o.a.l.s.TermScorer.advance(TermScorer.java:85)
	   o.a.l.s.ConjunctionScorer.doNext(ConjunctionScorer.java:69)
   
   22.0% (110ms out of 500ms) cpu usage by thread 'elasticsearch[index169][search][T#1]'
	 2/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
	 3/10 snapshots sharing following 21 elements
	   org.apache.lucene.store.DataInput.readVInt(DataInput.java:120)
	   org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock(SegmentTermsEnumFrame.java:152)
	   org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:561)
	   o.a.l.i.TermContext.build(TermContext.java:95)
	   com.giiso.index.query.TianjiQuery$TianjiWeight.<init>(TianjiQuery.java:373)
	   o.a.l.s.FilteredQuery.createWeight(FilteredQuery.java:82)
	 5/10 snapshots sharing following 19 elements
	   com.giiso.index.query.TianjiQuery$TianjiWeight.<init>(TianjiQuery.java:373)
   
   21.3% (106.7ms out of 500ms) cpu usage by thread 'elasticsearch[index169][search][T#55]'
	 4/10 snapshots sharing following 18 elements
	   o.a.l.s.FilteredQuery$LeapFrogScorer.advanceToNextCommonDoc(FilteredQuery.java:274)
	   o.a.l.s.FilteredQuery$LeapFrogScorer.nextDoc(FilteredQuery.java:286)
	   o.a.l.s.Weight$DefaultBulkScorer.scoreAll(Weight.java:192)
	   o.a.l.s.Weight$DefaultBulkScorer.score(Weight.java:163)
	   o.a.l.s.ConstantScoreQuery$ConstantBulkScorer.score(ConstantScoreQuery.java:216)
	   o.a.l.s.BulkScorer.score(BulkScorer.java:35)
	 2/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
	 4/10 snapshots sharing following 7 elements
	   org.elasticsearch.action.count.TransportCountAction.shardOperation(TransportCountAction.java:64)
	   org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:338)
	   org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:324)
	   org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
	   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	   java.lang.Thread.run(Thread.java:745)

com.giiso**are my customized query

The second time of 1.3:

::: [index110]
   
   26.0% (129.7ms out of 500ms) cpu usage by thread 'elasticsearch[index110][search][T#58]'
	 10/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
   
   25.4% (126.9ms out of 500ms) cpu usage by thread 'elasticsearch[index110][search][T#46]'
	 10/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
   
   22.8% (113.7ms out of 500ms) cpu usage by thread 'elasticsearch[index110][search][T#28]'
	 10/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)

::: [index111][NPhSc3sjRDucCEFDmn_LRA][TJ-index111][inet[/192.168.2.111:9300]]
   
   34.5% (172.4ms out of 500ms) cpu usage by thread 'elasticsearch[index111][search][T#16]'
	 10/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
   
   29.3% (146.3ms out of 500ms) cpu usage by thread 'elasticsearch[index111][search][T#35]'
	 10/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
   
   28.2% (140.9ms out of 500ms) cpu usage by thread 'elasticsearch[index111][search][T#48]'
	 10/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)

::: [index169][HuteZlkFTe6tnCfLmizIGQ][TJ-index7][inet[/192.168.2.169:9300]]
   
   54.3% (271.3ms out of 500ms) cpu usage by thread 'elasticsearch[index169][search][T#57]'
	 2/10 snapshots sharing following 31 elements
	   ...
	   org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.refillDocs(Lucene41PostingsReader.java:380)
	   org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.advance(Lucene41PostingsReader.java:505)
	   o.a.l.s.TermScorer.advance(TermScorer.java:85)
	   o.a.l.s.FilteredQuery$LeapFrogScorer.advanceToNextCommonDoc(FilteredQuery.java:274)
	   ...
	   o.e.c.l.Lucene.count(Lucene.java:139)

	 2/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
	   java.lang.Thread.run(Thread.java:745)
	 6/10 snapshots sharing following 7 elements
	   org.elasticsearch.action.count.TransportCountAction.shardOperation(TransportCountAction.java:64)

   52.0% (259.7ms out of 500ms) cpu usage by thread 'elasticsearch[index169][search][T#88]'
	 10/10 snapshots sharing following 2 elements
	   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   
   50.3% (251.6ms out of 500ms) cpu usage by thread 'elasticsearch[index169][search][T#74]'
	 7/10 snapshots sharing following 17 elements
	   o.a.l.s.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:60)
	   o.a.l.s.IndexSearcher.search(IndexSearcher.java:618)
	 3/10 snapshots sharing following 7 elements
	   org.elasticsearch.action.count.TransportCountAction.shardOperation(TransportCountAction.java:64)

The third time of 1.3:

::: [index110][D-_h4nlnT0KcpBTG-Zjq5Q][TJ-index9][inet[/192.168.2.110:9300]]
   
   14.1% (70.5ms out of 500ms) cpu usage by thread 'elasticsearch[index110][search][T#56]'
	 10/10 snapshots sharing following 2 elements
	   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   
   12.3% (61.5ms out of 500ms) cpu usage by thread 'elasticsearch[index110][search][T#25]'
	 10/10 snapshots sharing following 2 elements
	   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   
	9.9% (49.6ms out of 500ms) cpu usage by thread 'elasticsearch[index110][search][T#18]'
	 10/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
	   java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

::: [index111][NPhSc3sjRDucCEFDmn_LRA][TJ-index111][inet[/192.168.2.111:9300]]
   
   19.5% (97.3ms out of 500ms) cpu usage by thread 'elasticsearch[index111][index][T#9]'
	 10/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
   
   14.1% (70.3ms out of 500ms) cpu usage by thread 'elasticsearch[index111][search][T#22]'
	 10/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
   
   13.8% (69.1ms out of 500ms) cpu usage by thread 'elasticsearch[index111][search][T#56]'
	 2/10 snapshots sharing following 17 elements
	   o.a.l.s.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:60)
	 8/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
	   java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
::: [index169][HuteZlkFTe6tnCfLmizIGQ][TJ-index7][inet[/192.168.2.169:9300]]
   
   16.3% (81.7ms out of 500ms) cpu usage by thread 'elasticsearch[index169][search][T#76]'
	 10/10 snapshots sharing following 2 elements
	   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	   java.lang.Thread.run(Thread.java:745)
   
   15.4% (77.2ms out of 500ms) cpu usage by thread 'elasticsearch[index169][search][T#67]'
	 5/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
	 2/10 snapshots sharing following 13 elements
	   java.lang.ThreadLocal.remove(ThreadLocal.java:222)
	   org.elasticsearch.index.query.QueryParseContext.removeTypes(QueryParseContext.java:75)
	 2/10 snapshots sharing following 17 elements
	   o.a.l.s.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:60
	 unique snapshot
	   o.a.l.s.TermQuery.createWeight(TermQuery.java:165)
	   o.a.l.s.FilteredQuery.createWeight(FilteredQuery.java:82)
   
   13.3% (66.4ms out of 500ms) cpu usage by thread 'elasticsearch[index169][search][T#80]'
	 3/10 snapshots sharing following 10 elements
	   sun.misc.Unsafe.park(Native Method)
	 7/10 snapshots sharing following 7 elements
	   org.elasticsearch.action.count.TransportCountAction.shardOperation(TransportCountAction.java:64)

I've separated the two index into different nodes, ie. index_all in node1-4, index_recent in node5-7.
The search speed is still slow. Averagely it cost more than 3 seconds to count a TermQuery. While in 1.3, it cost about 0.7 seconds.

Maybe i should start a machine purely for the speed test...To find out what's the problem.

Probably the next month
Below is my index setting, could you help me to check if there is any problem? We use a customized analyzer for analyzed field, and my count request is based on textcontent field. ie. query{term:{textcontent: word}}

{
  "state": "open",
  "settings": {
    "index": {
      "refresh_interval": "1s",
      "number_of_shards": "5",
      "provided_name": "tj_news_all",
      "analysis": {
        "filter": {
          "float_delimited": {
            "encoding": "float",
            "type": "delimited_payload_filter"
          }
        },
        "analyzer": {
          "default_search": {
            "type": "nlp_query"
          },
          "default": {
            "type": "nlp_index"
          },
          "keyword": {
            "filter": "float_delimited",
            "type": "custom",
            "tokenizer": "whitespace"
          },
          "full_text": {
            "filter": "type_as_payload",
            "type": "custom",
            "tokenizer": "nlp_index_token"
          }
        }
      }
  },
  "mappings": {
    "news": {
      "dynamic": "strict",
      "_all": {
        "search_analyzer": "default_search",
        "analyzer": "default",
        "enabled": false
      },
      "properties": {
        "textcontent": {
          "search_analyzer": "nlp_query",
          "analyzer": "full_text",
          "term_vector": "with_positions_offsets_payloads",
          "type": "text",
          "index_options": "offsets"
        },
        "keywords": {
          "norms": false,
          "analyzer": "nlp_keyword",
          "store": true,
          "type": "text",
          "index_options": "freqs"
        },
        "imagenum": {
          "store": true,
          "type": "integer",
          "doc_values": false
        },
        "title": {
          "search_analyzer": "nlp_query",
          "analyzer": "full_text",
          "term_vector": "with_positions_offsets_payloads",
          "store": true,
          "type": "text",
          "index_options": "offsets"
        },
        "istitlematched": {
          "store": true,
          "type": "boolean"
        },
        "innerfrom": {
          "index": false,
          "store": true,
          "type": "keyword",
          "doc_values": false
        },
        "download": {
          "format": "yyyy-MM-dd HH:mm:ss||epoch_millis",
          "store": true,
          "type": "date"
        },
        "_md5": {
          "store": true,
          "type": "keyword",
          "doc_values": false
        },
        "cat": {
          "store": true,
          "type": "keyword",
          "doc_values": false
        },
        "from": {
          "store": true,
          "type": "keyword",
          "doc_values": false
        },
        "_pid": {
          "store": true,
          "type": "keyword",
          "doc_values": false
        },
        "summary": {
          "search_analyzer": "nlp_query",
          "analyzer": "full_text",
          "term_vector": "with_positions_offsets_payloads",
          "store": true,
          "type": "text",
          "index_options": "offsets"
        },
        "image": {
          "index": false,
          "store": true,
          "type": "keyword",
          "doc_values": false
        },
        "textrank": {
          "search_analyzer": "nlp_query",
          "analyzer": "keyword",
          "type": "text",
          "index_options": "freqs"
        },
        "innerfromurl": {
          "store": true,
          "type": "keyword",
          "doc_values": false
        },
        "duplicate": {
          "store": true,
          "type": "boolean",
          "doc_values": false
        },
        "article": {
          "index": false,
          "type": "keyword",
          "doc_values": false
        },
        "url": {
          "store": true,
          "type": "keyword",
          "doc_values": false
        },
        "tags": {
          "store": true,
          "type": "keyword",
          "doc_values": false
        },
        "site": {
          "eager_global_ordinals": true,
          "store": true,
          "type": "keyword"
        },
        "multiscore": {
          "norms": false,
          "analyzer": "nlp_keyword",
          "store": true,
          "type": "text",
          "index_options": "freqs"
        },
        "isdirty": {
          "store": true,
          "type": "boolean",
          "doc_values": false
        },
        "time": {
          "format": "yyyy-MM-dd HH:mm:ss||epoch_millis",
          "store": true,
          "type": "date"
        },
        "_tid": {
          "store": true,
          "type": "long"
        },
        "mediatype": {
          "store": true,
          "type": "integer"
        },
        "status": {
          "store": true,
          "type": "byte"
        }
      }
    }
  },
  "aliases": [
    "tianji"
  ]
}

Can you also share what your query looks like and quantify the slow down?

My query is term{textcontent: word}
I iterates about 100 thousands different words to check the seach speed.

And in 1.3, the average speed is 700ms, in 5.0 it is 3 seconds.

I recognized another different today: the index_option.
In 1.3, it's the default value, in 5.0 it's offsets. Is that matter?

That might have an impact on the indexing speed, but it should not have an impact on the search speed. Is everything else comparable between your 1.3 and 5.0 setup? Can you also make sure both clusters return the same number of matches and top hits in both cases?

The hits are almost the same, but not all the same...
These two clusters are used for production, thus i cannot do some complex tests on them.

Anyway, i will make two clusters with the same setups and copy enough data into them to test the problem.

This work may start a few weeks later.
After that, I will post a result here.

Thanks.

I found that the index in 5.0 has 8 segments(.doc) with the same size.
The count of cfs files is about 100. And what's more, the maximum length of .cfs is 5GB. It seems strange, I think they should be merged into bigger ones?
(I asked this question in a new post:
How to choose/change the maximum size of a segment?)

While in 1.3, it is a huge segment ( I optimize the segments to 1 about two months ago) and about 20 smaller ones.
Maybe that's the problem?
I processed a forcemerge yesterday, but it seems still running.

I checked the documents, but did not find settings that controlling the merge progress, etc. when should it start, how many segments after merge.

In fact, we are using es indexing news crawled by our program. Each day, 300 thousands records are added into the cluster. Complex search requests are mainly posted to the recent data (60 days). At the same time, we need to count the records containing certain terms. Averagely, 2000 count requests every minutes (I built a cache for the results, but there are too many terms).
So I build two indexes, one for the recent data and one the whole data (handling the count requests).
Could you give me some advance to make the structure better?

btw: I conduct a separated test these days. The search speed is almost the same in both 1.3 and 5.0. The test indexes use SmartCN analyzer, and deleted a field which is not indexed but extremely large. All the other settings are the same as that in product machines. So now the only difference seems to be the segments number.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.