# How do Elasticsearch calculate term freq when using CutoffFrequency in CommonTermsQuery?

I'm trying to understand how CutoffFrequency works.
And i found that the frequency calculated by elastic is not what i assumed.

here's my env:

• i builded my index with `1 shard , 0 replicas`

• my query written in GO: `elastic.NewCommonTermsQuery("title", "batman official trailers").Analyzer("synonym").CutoffFrequency(0.006).LowFreqMinimumShouldMatch(4)`

• synonym:

• expand `batman` into `bruce`
• target doc:

• `{"title": "Batman Batman Batman Batman Batman bruce Batman official trailers"}`
• statistics from requesting `/_termvectors`:

``````{
"_index": "posts",
"_type": "posts",
"_id": "227824665",
"_version": 6,
"found": true,
"took": 38,
"term_vectors": {
"title": {
"field_statistics": {
"sum_doc_freq": 635,
"doc_count": 155,
"sum_ttf": 641
},
"terms": {
"batman": {
"doc_freq": 2,
"ttf": 12,
"term_freq": 6,
"tokens": [...]
},
"official": {
"doc_freq": 2,
"ttf": 2,
"term_freq": 1,
"tokens": [...]
},
"bruce": {
"doc_freq": 2,
"ttf": 2,
"term_freq": 1,
"tokens": [...]
},
"trailers": {
"doc_freq": 2,
"ttf": 2,
"term_freq": 1,
"tokens": [...]
}
}
}
}
}
``````
• the frequency i assumed:

• batman: 12 / 641 = 0.018
• official: 2/641 = 0.003
• bruce: 2/641 = 0.003
• trailers: 2/641 = 0.003

but it turns out that `CutoffFrequency(0.006)` would not retrieve target doc and `CutoffFrequency(0.007)` would retire the target doc
Would anyone know how elasticsearch calculate term frequency?

OK, i figured it out by myself

the Frequency here is doc freq of a term

e.q.:

• batman: 2/155
• official: 2/155
• bruce: 2/155
• trailers: 2/155

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.