BM25 how to scoring ignore idf or set scope for total number of documents with search field

wlc015f · December 29, 2021, 11:03am

elasticsearch version 7.16.2

index settings :
    "similarity" : {
      "default" : {
          "type" : "BM25",
          "b" : "0.75",
          "k1" : "1.2"
      }
   }

Data example: 

{
  "application" : 110,
  "type" : "page",
  "name" : "需要创建知识库",
  "pilot_id" : "61652c90365fc21e26cd48d0",
  "created_by" : "b03f9904e7e343dda4b79ab85a050ee1",
  "created_at" : 1637312289,
  "updated_at" : 1637312295,
  "updated_by" : "b03f9904e7e343dda4b79ab85a050ee1",
  "is_deleted" : 0,
  "is_archived" : 0,
  "addition" : {
    "published_by" : "b03f9904e7e343dda4b79ab85a050ee1",
    "published_at" : 1637312295,
    "content" : "测试",
    "participants" : [ ]
  }
 }

DSL 

{
  "explain": true,
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    },
    {
      "updated_at": {
        "order": "desc",
        "unmapped_type": "long"
      }
    }
  ],
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "application": 110
              }
            },
            {
              "terms": {
                "type": [
                  "page"
                ]
              }
            },
            {
              "term": {
                "is_deleted": 0
              }
            }
          ]
        }
      },
      "should": [
        {
          "multi_match": {
            "query": "创建知识",
            "fields": [
              "name",
              "addition.content"
            ],
            "type": "best_fields",
            "tie_breaker": 0.3
          }
        }
      ]
    }
  }
}

search explain compare is here https://editor.mergely.com/PamGkwBm/

score = boost * idf * tf

  idf = log(1 + (N - n + 0.5) / (n + 0.5)) 
    n, number of documents containing term
    N, total number of documents with field

  tf = freq / (freq + k1 * (1 - b + b * dl / avgdl))
    freq, occurrences of term within document
    k1, term saturation parameter 
    b, length normalization parameter 
    dl, length of field
    avgdl, average length of field

data stats:
data with `name` field  count is 437273
data with `addition.content` field  count is 11397

i need to search one keyword in two fields name and addition.content, some doc's addition.content is empty , through the compare and the score calculation formula, N is the biggest influence factor, how can i ignore the total number of documents, or is there some way to specify the scope of N ?

thanks

system · January 26, 2022, 11:04am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query regarding scoring of ES8. 2 Elasticsearch	4	241	October 5, 2022
BM25 score of two indexed document in elasticsearch Elasticsearch	10	1221	March 1, 2018
Elasticseach: Default Similairty Algorithm and BM25 giving same results Elasticsearch	12	2203	November 14, 2018
Tf-idf custom similarity and bm25 gives same scores and identical results along with a minor problem Elasticsearch	3	465	October 23, 2022
Filter search by ids with BM25 score in Python (Elastic 8.7) Elasticsearch language-clients	3	688	May 22, 2023

BM25 how to scoring ignore idf or set scope for total number of documents with search field

Related topics