How does the cutoff_frequency work on the match query

Given the following sense session:

DELETE /my-index

PUT my-index
{
  "settings": {
    "number_of_replicas": 0,
    "number_of_shards": 1
  }
}

PUT /my-index/search_item/1
{
  "title" : " the car",
  "type": "article"
}

PUT /my-index/search_item/2
{
  "title" : "the the building",
  "type": "calculator"
}

PUT /my-index/search_item/3?refresh
{
  "title" : "the kibana rises, the world rejoices",
  "type": "calculator"
}

GET /my-index/search_item/2/_termvectors?fields=title

POST /my-index/_search
{
  "query" : {
    "match" : {
      "title" : {
        "query" : "the kibana "
        , "cutoff_frequency": 2
        , "operator": "or"
      }
    }
  },
  "explain": true
}

A cutoff_frequency off 2 returns 1 document but changing cutoff_frequency to 3 yields 3.
documents.

Since I am expecting it to yield 0 documents in the latter case makes me suspect that I do not fully understand this feature to dynamically ignore high frequency terms.

As a side note explain: true does not return info (maybe by design) to help diagnose whats going on.

Anyone with intimate knowledge of its execution able to guide me through it?

2 Likes

Still no answer to this question?
I just can't get my head around how the cutoff_frequency works :disappointed:

I've read this and this documentation several times now, and I think I know the solution to this problem (although it is not specified in the documentation as far as I can see).

I think that cutoff_frequency: 3 means the term must be in MORE than 3 documents. Meaning that in your scenario "the" will no longer be a high frequency word, because it is 3 documents exactly.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.