elasticsearch version 7.16.2
index settings :
"similarity" : {
"default" : {
"type" : "BM25",
"b" : "0.75",
"k1" : "1.2"
}
}
Data example:
{
"application" : 110,
"type" : "page",
"name" : "需要创建知识库",
"pilot_id" : "61652c90365fc21e26cd48d0",
"created_by" : "b03f9904e7e343dda4b79ab85a050ee1",
"created_at" : 1637312289,
"updated_at" : 1637312295,
"updated_by" : "b03f9904e7e343dda4b79ab85a050ee1",
"is_deleted" : 0,
"is_archived" : 0,
"addition" : {
"published_by" : "b03f9904e7e343dda4b79ab85a050ee1",
"published_at" : 1637312295,
"content" : "测试",
"participants" : [ ]
}
}
DSL
{
"explain": true,
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"updated_at": {
"order": "desc",
"unmapped_type": "long"
}
}
],
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"term": {
"application": 110
}
},
{
"terms": {
"type": [
"page"
]
}
},
{
"term": {
"is_deleted": 0
}
}
]
}
},
"should": [
{
"multi_match": {
"query": "创建知识",
"fields": [
"name",
"addition.content"
],
"type": "best_fields",
"tie_breaker": 0.3
}
}
]
}
}
}
search explain compare is here https://editor.mergely.com/PamGkwBm/
score = boost * idf * tf
idf = log(1 + (N - n + 0.5) / (n + 0.5))
n, number of documents containing term
N, total number of documents with field
tf = freq / (freq + k1 * (1 - b + b * dl / avgdl))
freq, occurrences of term within document
k1, term saturation parameter
b, length normalization parameter
dl, length of field
avgdl, average length of field
data stats:
data with `name` field count is 437273
data with `addition.content` field count is 11397
i need to search one keyword in two fields name
and addition.content
, some doc's addition.content
is empty , through the compare and the score calculation formula, N
is the biggest influence factor, how can i ignore the total number of documents, or is there some way to specify the scope of N
?
thanks