I create an index by PUT my-index-000002 { "mappings": { "properties": { "content":{ "type": "keyword", "index_options": "freqs" }, "id":{ "type": "integer" } } } }
Then I add a doc by POST my-index-000002/_doc/1 { "content": ["apple","apple"] }
I search the doc by GET my-index-000002/_search { "query": { "term": { "content": { "value": "apple" } } }, "explain": true }
dl is the field length for the document so the number of terms that it contains, we call it the norm of the field
So you've defined content as a keyword. You've passed in an array, but it's going to treat that array as a single value because of that mapping, and that's what it's reporting back to you.
If the keyword array is treated as a single value, I think the avgdl should be always 1. Could you tell me the reason why the avgdl is 2 in the explanation?
I think it's by design. Among the keyword field parameters is norms, which controls "whether field-length should be taken into account when scoring queries." The default is false. If I set it to "true" in the mapping definition, I see the expected result. Here's the mapping:
Note that the formula in the explanation is tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)). Scoring dl as 1.0 makes it do nothing in the formula; I would expect to see it as 1.0 for any keyword field score when norms is disabled.
Let me know if this works for you. (You'll have to delete and recreate the index, or test on a new one.)
The trade-off for norms, according to the docs, is that the index will require more disk space.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.