Why node query cache works only for filters

vgindin · November 22, 2018, 9:52am

Hi, I have a custom query with ml scoring algorithm. I'd like to be able to cache search results before different aggregations/sorting are applied. I.e. request_cache is not suitable and node query cache would be great, but it works only for queries in a filter context. Why?

Could you advise me some cache solution for my custom query? How it is possible to implement custom cache in elasticsearch plugin?

Regards,
Vadim Gindin

jpountz · November 30, 2018, 1:16pm

Hi Vadim,

Aggregations/sorting don't happen after results are computed but at the same time via a MultiCollector that has multiple sub collectors: one that compute top hits, another one that computes aggregations. What kind of data are you willing to cache exactly?

The query cache only caches non-scoring queries (ie. filters) because caching scores would add a lot of memory overhead. We typically need between 1 bit and 2 bytes per cached docid while caching scores would require at least 4 additional bytes.

vgindin · December 1, 2018, 5:42pm

Hi Adrien!

In the current query implementation the scores calculation (using ML algorithm) is more expensive than matching and match processing in scorer. So I'd like to cache right document scores by ids.

I'd probably ready to additional memory consumption. Is it a possibility to use the node cache for that somehow?

--

My current implementation uses bulkScorer to score matched docs using at once. But I've found that bulkScorer() is not called in some cases when the query is wrapped in BooleanQuery and there are more then one required clause (see BooleanWeight.booleanScorer()). So it doesn't look like universal way to us ML scoreing aglorithm. Is there a way to get around this limit?

Thanks!

jpountz · December 3, 2018, 9:15am

Hi Vadim,

If the bulk scorer helps because computing multiple scores at once is easier, then you could consider using something like Lucene's BulkScorerWrapperScorer: https://github.com/apache/lucene-solr/blob/910a0231f6fc668426056e31d43e293248ff5ce1/lucene/test-framework/src/java/org/apache/lucene/search/BulkScorerWrapperScorer.java. It was initially designed for testing, but it might be helpful in your case as well.

If you need a more general caching mechanism, I don't think reusing an existing cache is an option. I guess that the easier way to do it would be to fold it directly into your query (since you seem to already have a custom plugin for a query). Make sure to register closed listeners so that the JVM can reclaim memory for segments that have been merged away.

vgindin · December 4, 2018, 10:02am

Hi Adrien!

Could you advise me (or share a link) how to implement closing listeners correctly? As I've found, I should register my own AbstractLifecycleComponent and add there a LifecycleListener that will clear a cache in beforeStop() method. Is that correct?

P.S. I have the other difficulty with BulkScorer:

If my index has nested fields, than org.elasticsearch.search.DefaultSearchContext wraps my query with BooleanQuery and add Occur.FILTER clause to it. (that become DocValuesFieldExistsQuery [field=_primary_term] in fact). Further, when executing BooleanWeight.booleanScorer() method prevents my bulkScorer been executed, because the final query has 2 required clauses (have a look there).
I'd like to be able to wrap my query to boolean query by myself.

So, in these cases (when my query is wrapped to boolean) my bulkScorer is not called at all. Thereforе my search does not work at all. (not only caching)

Is there a way to overcome this limitation somehow?

Thanks,
Vadim Gindin

jpountz · December 5, 2018, 1:14pm

Hi Vadim,

Actually no, you could do it at the level of your query directly. I guess the closest example to that that we have is the query cache, see eg. calls to addClosedListener at https://github.com/apache/lucene-solr/blob/b4449c73e4c1ed34bc155ae5a818ac1a870ea7f8/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java.

Indeed Elasticsearch adds filters implicitly. BulkScorer is supposed to be an optimization rather than the regular API for queries to implement. That said if this API is more convenient for you to implement, you could make the Weight#scorer method return something that looks like the BulkScorerWrapperScorer wrapper class that I showed above, which wraps a BulkScorer and implements the Scorer API.

system · January 2, 2019, 1:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Populate query cache when sending a "scoring queries" Elasticsearch	2	577	July 5, 2017
Enhanced Elasticsearch cache Elasticsearch	1	383	July 5, 2017
Filter caching in complex query trees Elasticsearch	0	24	September 3, 2024
ES 1.7: Caching of filters inside Constant Score and Function Score queries Elasticsearch	1	643	July 5, 2017
Documentation for node level caching and a few caching related questions Elasticsearch	6	368	July 6, 2017

Why node query cache works only for filters

Related topics