QueryPhaseCollector invokes lucene score() twice on every doc when min_score is used

Elastic 8.10 introduced a major change QueryPhaseCollector (see https://github.com/elastic/elasticsearch/pull/97410) in how lucene queries are executed. I have observed that now, when using min_score, each document gets scored twice! That does NOT happen when min_score is not used, and this did NOT happen on previous versions of Elasticsearch.

Aside from being terribly inefficient, this actually creates problems for certain low level query types (Span) which make use of "positions" and would give different results when a document is scored repeatedly. I struggled to document a clear example using this exact Span behavior, but we have custom Lucene queries that are failing due to this "invoke score twice" issue. So despite not demonstrating true query failure, I can demonstrate that "score is invoked twice" and hopefully you agree that alone is bad / needs fixing.

Here is an easy way for you to reproduce the "invoke score twice":

  1. create a new index with just 1 document
POST /testing/_doc
{
"message": "hello"
}
  1. Set a debugger breakpoint on org.apache.lucene.search.TermScorer.score()

  2. Execute this query

POST /testing/_search
{
"min_score": 0.001,
"query": {
"term": {
"message": "hello"
}
}
}
  1. Observe the breakpoint gets hit 2 times for this same document (there is only 1 doc in the index), see below for the stacktrace. This is NOT the expected behavior. Prior versions of Elastic call score() only once as expected (see below for identical test stacktrace in ES 8.6.2). Comparing these stacktraces, I see that ES 8.6.2 used an object called ScoreCachingWrappingScorer which prevents the lower level score() method from being called a second time. However ES8.10.x (also ES8.11.0) no longer has the ScoreCachingWrappingScorer object at all (due to the new QueryPhaseCollector logic) so I believe this is a new "bug" in Elasticsearch.

  2. Remove the min_score clause from the query and repeat, this time observe the breakpoint gets hit only 1 time (as expected). So this issue is caused by the use of min_score.

Stacktrace in ES 8.11.0
(First call to score within the single request)

Thread [elasticsearch[singlenode][search_worker][T#17]] (Suspended (breakpoint at line 85 in TermScorer))	
	TermScorer.score() line: 85	
	QueryPhaseCollector.isDocWithinMinScore(Scorable) line: 122	
	QueryPhaseCollector.shouldCollectTopDocs(int, Scorable, Bits) line: 118	
	QueryPhaseCollector$TopDocsLeafCollector.collect(int) line: 236	
	Weight$DefaultBulkScorer.scoreRange(LeafCollector, DocIdSetIterator, TwoPhaseIterator, DocIdSetIterator, Bits, int, int) line: 277	
	Weight$DefaultBulkScorer.score(LeafCollector, Bits, int, int) line: 236	
	CancellableBulkScorer.score(LeafCollector, Bits, int, int) line: 45	
	CancellableBulkScorer(BulkScorer).score(LeafCollector, Bits) line: 38	
	ContextIndexSearcher.searchLeaf(LeafReaderContext, Weight, Collector) line: 536	
	ContextIndexSearcher.search(List<LeafReaderContext>, Weight, Collector) line: 460	
	ContextIndexSearcher.lambda$search$4(AtomicInteger, LeafReaderContext[], Weight, Collector, List) line: 375	
	0x0000000100d0dde0.call() line: not available	
	ContextIndexSearcher$1(FutureTask<V>).run() line: 317	
	TimedRunnable.doRun() line: 33	
	ThreadContext$ContextPreservingAbstractRunnable.doRun() line: 983	
	ThreadContext$ContextPreservingAbstractRunnable(AbstractRunnable).run() line: 26	
	TaskExecutionTimeTrackingEsThreadPoolExecutor(ThreadPoolExecutor).runWorker(ThreadPoolExecutor$Worker) line: 1144	
	ThreadPoolExecutor$Worker.run() line: 642	
	Thread.runWith(Object, Runnable) line: 1596	
	Thread.run() line: 1583

(Second call to score within the same request as above)

Thread [elasticsearch[singlenode][search_worker][T#17]] (Suspended (breakpoint at line 85 in TermScorer))	
	TermScorer.score() line: 85	
	TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(int) line: 73	
	QueryPhaseCollector$TopDocsLeafCollector.collect(int) line: 242	
	Weight$DefaultBulkScorer.scoreRange(LeafCollector, DocIdSetIterator, TwoPhaseIterator, DocIdSetIterator, Bits, int, int) line: 277	
	Weight$DefaultBulkScorer.score(LeafCollector, Bits, int, int) line: 236	
	CancellableBulkScorer.score(LeafCollector, Bits, int, int) line: 45	
	CancellableBulkScorer(BulkScorer).score(LeafCollector, Bits) line: 38	
	ContextIndexSearcher.searchLeaf(LeafReaderContext, Weight, Collector) line: 536	
	ContextIndexSearcher.search(List<LeafReaderContext>, Weight, Collector) line: 460	
	ContextIndexSearcher.lambda$search$4(AtomicInteger, LeafReaderContext[], Weight, Collector, List) line: 375	
	0x0000000100d0dde0.call() line: not available	
	ContextIndexSearcher$1(FutureTask<V>).run() line: 317	
	TimedRunnable.doRun() line: 33	
	ThreadContext$ContextPreservingAbstractRunnable.doRun() line: 983	
	ThreadContext$ContextPreservingAbstractRunnable(AbstractRunnable).run() line: 26	
	TaskExecutionTimeTrackingEsThreadPoolExecutor(ThreadPoolExecutor).runWorker(ThreadPoolExecutor$Worker) line: 1144	
	ThreadPoolExecutor$Worker.run() line: 642	
	Thread.runWith(Object, Runnable) line: 1596	
	Thread.run() line: 1583

Stacktrace in ES 8.6.2

Thread [elasticsearch[singlenode][search][T#7]] (Suspended (breakpoint at line 74 in TermScorer))	
	TermScorer.score() line: 74	
	ScoreCachingWrappingScorer.score() line: 61	
	MinimumScoreCollector.collect(int) line: 45	
	Weight$DefaultBulkScorer.scoreRange(LeafCollector, DocIdSetIterator, TwoPhaseIterator, Bits, int, int) line: 274	
	Weight$DefaultBulkScorer.score(LeafCollector, Bits, int, int) line: 254	
	CancellableBulkScorer.score(LeafCollector, Bits, int, int) line: 45	
	CancellableBulkScorer(BulkScorer).score(LeafCollector, Bits) line: 38	
	ContextIndexSearcher.searchLeaf(LeafReaderContext, Weight, Collector) line: 196	
	ContextIndexSearcher.search(List<LeafReaderContext>, Weight, Collector) line: 170	
	ContextIndexSearcher(IndexSearcher).search(Query, Collector) line: 519	
	QueryPhase.searchWithCollector(SearchContext, ContextIndexSearcher, Query, LinkedList<QueryCollectorContext>, boolean, boolean) line: 224	
	QueryPhase.executeInternal(SearchContext) line: 179	
	QueryPhase.execute(SearchContext) line: 80	
	SearchService.loadOrExecuteQueryPhase(ShardSearchRequest, SearchContext) line: 470	
	SearchService.executeQueryPhase(ShardSearchRequest, SearchShardTask) line: 635	
	SearchService.lambda$executeQueryPhase$2(ShardSearchRequest, SearchShardTask) line: 495	
	0x0000000100c45530.get() line: not available	
	0x0000000100c45740.get() line: not available	
	ActionRunnable$2.accept(ActionListener<T>) line: 50	
	ActionRunnable$2.accept(Object) line: 47	
	ActionRunnable$3.doRun() line: 72	
	ActionRunnable$3(AbstractRunnable).run() line: 26	
	TimedRunnable.doRun() line: 33	
	ThreadContext$ContextPreservingAbstractRunnable.doRun() line: 917	
	ThreadContext$ContextPreservingAbstractRunnable(AbstractRunnable).run() line: 26	
	EWMATrackingEsThreadPoolExecutor(ThreadPoolExecutor).runWorker(ThreadPoolExecutor$Worker) line: 1144	
	ThreadPoolExecutor$Worker.run() line: 642	
	Thread.run() line: 1589
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.