Some other pointers that may help you in your problem:
- You can use
explain=true
parameter in your query to get more details how scores were calculated. As a part of these details, you can obtain an average length of document in this index field:
For example:
GET my_index/_search?explain=true
{
"query" : {
"term": {"my_field": "fox"}
}
}
produces response:
...
description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
...
{
"value": 6.0,
"description": "avgFieldLength",
"details": []
},
- Another way is to have a special field type token_count that will calculate tokens count for every document for the specified field. To calculate the total value of tokens across all documents you can then create a
sum
aggregation on this token_count
field, for the average value – avg
aggregation.