we tested more like this query inside filtered query with some filters. but filtered query is depricated. and that's why we transform it bool query, we put more_like_this inside must clause and all filters inside filter. Result counts are the same but the new query is ~20% slower than old one. how can it be possible if filtered query should be transfromed to bool internally?
when i put the filters inside another bool.must (
bool:{
filter:{
bool:{
must:[{filter1},{filter2]
}
},
must{mlt}} )
it works with the same perfomance as filtered query. Can anyone explain why should i do this.
Can you also provide us with the filtered
query that you ran as well as the slow bool
query.
{"query": {"bool": {"filter": [{"terms": {"book_id": [some ids here]}}, {"term": {"not_available": false}}, {"bool": {"should": [{"exists": {"field": "link"}}, {"exists": {"field": "ISBN"}}]}}], "must": {"more_like_this": {"fields": ["text_field"], "like": [{"_type": "chapters", "_id": id}]}}}}}
{"query": {"filtered": {"filter": {"and": [{"terms": {"not_available": ["false"]}}, {"bool": {"should": [{"exists": {"field": "link"}}, {"exists": {"field": "ISBN"}}]}}, {"terms": {"book_id": [some ids here]}}]}, "query": {"more_like_this": {"fields": ["text_field"], "like": [{"_type": "chapters", "_id": id}]}}}}}
I am also confused why you are seenig different response times. Could you pass these queries to the _validate/query
API (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-validate.html) with ?rewrite=true
and share the output?
new
"explanations":[{"index":"index1","valid":true,"explanation":"+(+(((text_field:level text_field:aa text_field:c18 text_field:linol text_field:tabl text_field:c20 text_field:c22 text_field:australian text_field:linolen text_field:dha text_field:powder text_field:oil text_field:liquid text_field:181 text_field:lcp text_field:pufa text_field:la text_field:infant text_field:2c text_field:isom text_field:acid text_field:fatti text_field:lna text_field:tran text_field:formula)~7) -ConstantScore(_uid:chapter#7ab02ae1-063e-4e8d-bd13-19cae103e8b5)) #ConstantScore(book_id: \u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0001 book_id: \u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0002) #not_available:F #((ConstantScore(_field_names:link) ConstantScore(_field_names:ISBN))~1)) #ConstantScore(_type:chapter)"}]}
old
"explanations":[{"index":"index1","valid":true,"explanation":"+(+(((text_field:level text_field:aa text_field:c18 text_field:linol text_field:tabl text_field:c20 text_field:c22 text_field:australian text_field:linolen text_field:dha text_field:powder text_field:oil text_field:liquid text_field:181 text_field:lcp text_field:pufa text_field:la text_field:infant text_field:2c text_field:isom text_field:acid text_field:fatti text_field:lna text_field:tran text_field:formula)~7) -ConstantScore(_uid:chapter#7ab02ae1-063e-4e8d-bd13-19cae103e8b5)) #(+ConstantScore(not_available:F) +((ConstantScore(_field_names:link) ConstantScore(_field_names:ISBN))~1) +ConstantScore(book_id: \u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0001 book_id: \u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0002))) #ConstantScore(_type:chapter)"}]}
This still looks very similar. The main difference is that new
puts filters on the top level while old
puts them in a nested filter clause. Something that is odd is that the new
query uses the notavailable
field while the old
one uses not_available
. Could it be a copy-pasting issue?
I'm not sure there is anything we can fix here. I suspect that for some reason the old
query is more friendly to the JVM.
yes it is copy-pasting issue, and as i wrote before when wrap all filter clauses in another bool. must, it is working faster. should i try to construct new query as similar to old one as possible? Or should i use old?
If this works consistently better for you, then you can wrap filter clauses in a nested boolean query like the old one did.