基本的な質問なのですが、Percolator Queryのidf計算において必ずdocFreqとdocCountが両方とも1になってしまいます。これをindexのドキュメント全体に対しての値に変更することは可能ですが?というのも、Percolator QueryのscoreとSearch API のスコアのソート結果を一致させたいという背景がある為です。ElasticsearchのバージョンはElasticsearch5.6と6.6で試しました。シャード数5、クラスター数は3です。
index作成
PUT /my-index
{
"mappings": {
"article": {
"properties": {
"message": {
"type": "text"
},
"query": {
"type": "percolator"
}
}
}
}
}
percolator登録
PUT /my-index/article/1?refresh
{
"query" : {
"match" : {
"message" : "fox"
}
}
}
データ投入
PUT /my-index/article/2
{
"message" : "quick brown fox"
}
PUT /my-index/article/3
{
"message" : "quick fox"
}
PUT /my-index/article/4
{
"message" : "brown"
}
Percolator Query
GET /my-index/_search
{
"explain": true,
"query" : {
"percolate" : {
"field" : "query",
"index" : "my-index",
"type" : "article",
"id" : "2"
}
}
}
Percolator Query response
{
"took" : 30,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.2876821,
"hits" : [
{
"_shard" : "[my-index][1]",
"_node" : "poVG6FmxSXGB792tjGkoLA",
"_index" : "my-index",
"_type" : "article",
"_id" : "5",
"_score" : 0.2876821,
"_source" : {
"query" : {
"match" : {
"message" : "fox"
}
}
},
"fields" : {
"_percolator_document_slot" : [
0
]
},
"_explanation" : {
"value" : 0.2876821,
"description" : "PercolateQuery",
"details" : [
{
"value" : 0.2876821,
"description" : "weight(message:fox in 0) [BM25Similarity], result of:",
"details" : [
{
"value" : 0.2876821,
"description" : "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details" : [
{
"value" : 0.2876821,
"description" : "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details" : [
{
"value" : 1.0,
"description" : "docFreq",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "docCount",
"details" : [ ]
}
]
},
{
"value" : 1.0,
"description" : "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details" : [
{
"value" : 1.0,
"description" : "termFreq=1.0",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "parameter k1",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "parameter b",
"details" : [ ]
},
{
"value" : 3.0,
"description" : "avgFieldLength",
"details" : [ ]
},
{
"value" : 3.0,
"description" : "fieldLength",
"details" : [ ]
}
]
}
]
}
]
}
]
}
}
]
}
}
普通にSearch API でpercolatorと同じクエリを使った際にはdocFreqとdocCountは正しい値が入ります。シャード分割の問題かと考えsearch_type=dfs_query_then_fetch
を適用しましたが結果は変わりませんでした。
Percolator Queryのidf計算において必ずdocFreqとdocCountが両方とも1になってしまいますが、これをindex全体のドキュメントに対しての値に変更することは可能ですが?そもそもPercolator Queryについて重大な勘違いをしていますか?よろしくお願いいたします。