Why is the length of keyword array always 1?

I create an index by
PUT my-index-000002 { "mappings": { "properties": { "content":{ "type": "keyword", "index_options": "freqs" }, "id":{ "type": "integer" } } } }

Then I add a doc by
POST my-index-000002/_doc/1 { "content": ["apple","apple"] }

I search the doc by
GET my-index-000002/_search { "query": { "term": { "content": { "value": "apple" } } }, "explain": true }

But the dl in explanation is 1.

Is it a bug or by design?Can anyone help me?

Welcome to our community! :smiley: And Thanks heaps for providing a replication, it makes it heaps easier to see what you are doing!

From Wrong averageFieldLength(avgdl, average length of field) and fieldLength (dl, length of field) calculation · Issue #46855 · elastic/elasticsearch · GitHub;

dl is the field length for the document so the number of terms that it contains, we call it the norm of the field

So you've defined content as a keyword. You've passed in an array, but it's going to treat that array as a single value because of that mapping, and that's what it's reporting back to you.

Thanks a lot!

If the keyword array is treated as a single value, I think the avgdl should be always 1. Could you tell me the reason why the avgdl is 2 in the explanation?

Any reply will be greatly appreciated!

Not sure I can help there sorry, I struggled to find that info as it was.

Hopefully someone with more knowledge can step in!

Thanks for your help! :smiley:

Anyone else have any ideas?

Many Thanks!

I think it's by design. Among the keyword field parameters is norms, which controls "whether field-length should be taken into account when scoring queries." The default is false. If I set it to "true" in the mapping definition, I see the expected result. Here's the mapping:

PUT my-index-000002
{
  "mappings": {
    "properties": {
      "content": {
        "type": "keyword",
        "index_options": "freqs",
        "norms": true
      },
      "id": {
        "type": "integer"
      }
    }
  }
}

Note that the formula in the explanation is tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)). Scoring dl as 1.0 makes it do nothing in the formula; I would expect to see it as 1.0 for any keyword field score when norms is disabled.

Let me know if this works for you. (You'll have to delete and recreate the index, or test on a new one.)

The trade-off for norms, according to the docs, is that the index will require more disk space.

1 Like

Thanks! It works for me. :smiley:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.