Why is the length of keyword array always 1?

lyq2333 · March 21, 2023, 12:06pm

I create an index by
PUT my-index-000002 { "mappings": { "properties": { "content":{ "type": "keyword", "index_options": "freqs" }, "id":{ "type": "integer" } } } }

Then I add a doc by
POST my-index-000002/_doc/1 { "content": ["apple","apple"] }

I search the doc by
GET my-index-000002/_search { "query": { "term": { "content": { "value": "apple" } } }, "explain": true }

But the dl in explanation is 1.

Is it a bug or by design？Can anyone help me？

warkolm · March 22, 2023, 12:13am

Welcome to our community! And Thanks heaps for providing a replication, it makes it heaps easier to see what you are doing!

From Wrong averageFieldLength(avgdl, average length of field) and fieldLength (dl, length of field) calculation · Issue #46855 · elastic/elasticsearch · GitHub;

dl is the field length for the document so the number of terms that it contains, we call it the norm of the field

So you've defined content as a keyword. You've passed in an array, but it's going to treat that array as a single value because of that mapping, and that's what it's reporting back to you.

lyq2333 · March 22, 2023, 1:01am

Thanks a lot！

If the keyword array is treated as a single value, I think the avgdl should be always 1. Could you tell me the reason why the avgdl is 2 in the explanation?

Any reply will be greatly appreciated!

warkolm · March 22, 2023, 3:57am

Not sure I can help there sorry, I struggled to find that info as it was.

Hopefully someone with more knowledge can step in!

lyq2333 · March 22, 2023, 5:38am

Thanks for your help！

Anyone else have any ideas?

Many Thanks！

William_Brafford · March 22, 2023, 1:40pm

I think it's by design. Among the keyword field parameters is norms, which controls "whether field-length should be taken into account when scoring queries." The default is false. If I set it to "true" in the mapping definition, I see the expected result. Here's the mapping:

PUT my-index-000002
{
  "mappings": {
    "properties": {
      "content": {
        "type": "keyword",
        "index_options": "freqs",
        "norms": true
      },
      "id": {
        "type": "integer"
      }
    }
  }
}

Note that the formula in the explanation is tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)). Scoring dl as 1.0 makes it do nothing in the formula; I would expect to see it as 1.0 for any keyword field score when norms is disabled.

Let me know if this works for you. (You'll have to delete and recreate the index, or test on a new one.)

The trade-off for norms, according to the docs, is that the index will require more disk space.

lyq2333 · March 23, 2023, 12:56am

Thanks! It works for me.

system · April 20, 2023, 12:56am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Keyword Array Field Removes Duplicates in Queries Elasticsearch painless	3	200	March 5, 2024
Why field length during calculating BM25 score is approximate Elastic Search	3	156	July 3, 2024
Incorrect averageFieldLength(avgdl, average length of field) and fieldLength (dl, length of field) calculation Elasticsearch	1	980	October 17, 2019
Field length and average field lengths BM25 Elasticsearch	3	1090	March 22, 2021
How does Elasticsearch calculate the field-length norm? Elasticsearch	3	3387	July 6, 2017

Why is the length of keyword array always 1?

Related topics