I encountered a behavior I haven't expected with my current understanding how elastic search works. Obviously I did not understand as much as I thought
Both queries shown below look only slightly different and finally boil down to a match query. However, they return a different amount of document, i.e., the first returns more then the second.
I'm not really sure why the results are different but when using a filter, without specifying a query it uses match_all query. Where as the second one is a query with no filter. You can also use ?explain to see whats happening in _explanation.
thanks for your suggestions. The Explain API did not tell me a lot of understandable stuff. For that to be useful one would probably have to have deep insight in the mechanics of lucene.
I just encountered that I reduced my actual queries too much for showing here and I therewith missed the (how I encountered) problematic part. Next try, see below, I removed the bool-parts and instead kept in the cutoff_frequency which is responsible for the different countings. If I remove the cutoff_frequency, both queries return the same results. Leaving the property in leads to different results. However, my question remains the same:
"they don’t have to calculate the relevance _score for each document — the answer is just a boolean “Yes, the document matches the filter” or “No, the document does not match the filter”."
cutoff_frequency isn't about scoring in the first place, as far as I know. It's about the frequency of terms appearing in documents, right?
Furthermore, since the cutoff_frequency is embedded in a query, the calculation should take place here anyway, right? Only when the wrapping filter comes into play, the calculated scores are omitted...at least this is what I think how it works. This thesis is supported by the fact, that removing the cutoff-frequency increases the number of documents being found again.
Further ideas? I really don't get it. Played around a bit and I don't get the mechanism here.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.