Cutoff_frequency yields different results using filtered filter query or filtered query


(Stefan) #1

Hello everybody

I encountered a behavior I haven't expected with my current understanding how elastic search works. Obviously I did not understand as much as I thought :slight_smile:

Both queries shown below look only slightly different and finally boil down to a match query. However, they return a different amount of document, i.e., the first returns more then the second.

First query:

{
  "size": 0,
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": {
            "query": {
              "match": {
                "mydata.freetext": {
                  "query": "Elastic"
}}}}}}}}}

Second:

{
  "size": 0,
  "query": {
    "filtered": {
      "query": {
        "bool": {
          "must": {
            "match": {
              "mydata.freetext": {
                "query": "Elastic"
}}}}}}}}

Can anyone explain me why they return different results?

Thanks in advance!

Stefan


(Mike Simos) #2

I'm not really sure why the results are different but when using a filter, without specifying a query it uses match_all query. Where as the second one is a query with no filter. You can also use ?explain to see whats happening in _explanation.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html


(Mike Simos) #3

Also this may help you understand the differences between a query vs a filter and why the results are different.

https://www.elastic.co/guide/en/elasticsearch/guide/current/_queries_and_filters.html


(Stefan) #4

Hello Mike,

thanks for your suggestions. The Explain API did not tell me a lot of understandable stuff. For that to be useful one would probably have to have deep insight in the mechanics of lucene.

I just encountered that I reduced my actual queries too much for showing here and I therewith missed the (how I encountered) problematic part. Next try, see below, I removed the bool-parts and instead kept in the cutoff_frequency which is responsible for the different countings. If I remove the cutoff_frequency, both queries return the same results. Leaving the property in leads to different results. However, my question remains the same:

First query v2:

{
  "size": 0,
  "query": {
    "filtered": {
      "filter": {
            "query": {
              "match": {
                "mydata.freetext": {
                "query": "Elastic",
                "cutoff_frequency": 0.001
}}}}}}}

Second query v2:

{
  "size": 0,
  "query": {
    "filtered": {
      "query": {
        "match": {
          "mydata.freetext": {
            "query": "Elastic",
            "cutoff_frequency": 0.001
}}}}}}

Any ideas why this is the case?

Best regards, Stefan


(Mike Simos) #5

There isn't any scoring for filters, so I don't believe cutoff_frequency would do anything. See the following:

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html

"they don’t have to calculate the relevance _score for each document — the answer is just a boolean “Yes, the document matches the filter” or “No, the document does not match the filter”."


(Stefan) #6

cutoff_frequency isn't about scoring in the first place, as far as I know. It's about the frequency of terms appearing in documents, right?

Furthermore, since the cutoff_frequency is embedded in a query, the calculation should take place here anyway, right? Only when the wrapping filter comes into play, the calculated scores are omitted...at least this is what I think how it works. This thesis is supported by the fact, that removing the cutoff-frequency increases the number of documents being found again.

Further ideas? I really don't get it. Played around a bit and I don't get the mechanism here.


(system) #7