I use explain api to see the BM25 calculation details. I found in the tf calculation detail, the dl (field length) is not correct. the following json is what i got from explain api.
{
"value": 0.4435187,
"description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details": [
{
"value": 1.0,
"description": "freq, occurrences of term within document",
"details": []
},
{
"value": 1.2,
"description": "k1, term saturation parameter",
"details": []
},
{
"value": 0.75,
"description": "b, length normalization parameter",
"details": []
},
{
"value": 128.0,
"description": "dl, length of field (approximate)",
"details": []
},
{
"value": 120.666664,
"description": "avgdl, average length of field",
"details": []
}
]
}
we can see <"description": "dl, length of field (approximate)",> I want to know why the dl is approximate.