Understanding the Explain Query


(Peter Schröder) #1

i am trying to debug a scoring issue, but i did not find a proper guide to
how one would read the explain results.

there is a query that has custom field boosts:

curl -X GET
"http://localhost:9200/test-headings/heading/_search?from=0&load=false&page=1&per_page=75&raw_hits=true&size=75&pretty=true"
-d
'{"from":0,"size":75,"query":{"query_string":{"query":"drehteile","fields":["hk_name_de^10","hk_name_split_de^4","hk_recombined_de","hk_manual_de","hk_manual_final_de^0.1","hk_adjectiv_de","hk_singplur_de","hk_synonym_de","hk_replaced_de","hk_adjectiv_enh_de","hk_sea_enh_de","hk_click_keywords_de^4.5"],"use_dis_max":false}},"filter":{"numeric_range":{"company_counts.DE.DE":{"gt":0}}}}'

i thought that the results would be sorted according to the ^X modifieres
like:

  1. hk_name_de => 10
  2. hk_click_keywords_de => 4.5
  3. hk_name_split_de => 4
  4. rest_de
  5. hk_manual_final_de => 0.1

unfortunately the results are not ordered that way. i tried playing around
with dismax settings, omit_norms etc, which had no effect on what i was
doing. my test-data just has one keyword, so the number or length should
not matter anyways.

the explain-query that i get has the following part:

{
  "_shard" : 3,
  "_node" : "dHiSD_vaR46-UaWAjI71JQ",
  "_index" : "test-headings",
  "_type" : "heading",
  "_id" : "95",
  "_score" : 0.049284987, "_source" : {"company_counts":{"DE":{"DE":100,

"AT":10,"CH":1},"AT":{"DE":2,"AT":50,"CH":10},"CH":{"DE":1,"AT":5,"CH":30}},
"hk_name_de":[],"hk_name_split_de":[],"hk_recombined_de":[],"hk_manual_de":[
"drehteile"],"hk_manual_final_de":[],"hk_adjectiv_de":[],"hk_singplur_de"
:[],"hk_synonym_de":[],"hk_replaced_de":[],"hk_adjectiv_enh_de":[],
"hk_sea_enh_de":[],"hk_name_en":[],"hk_name_split_en":[],"hk_recombined_en"
:[],"hk_manual_en":[],"hk_manual_final_en":[],"hk_adjectiv_en":[],
"hk_singplur_en":[],"hk_synonym_en":[],"hk_replaced_en":[],
"hk_adjectiv_enh_en":[],"hk_sea_enh_en":[],"hk_name_fr":[],
"hk_name_split_fr":[],"hk_recombined_fr":[],"hk_manual_fr":[],
"hk_manual_final_fr":[],"hk_adjectiv_fr":[],"hk_singplur_fr":[],
"hk_synonym_fr":[],"hk_replaced_fr":[],"hk_adjectiv_enh_fr":[],
"hk_sea_enh_fr":[],"hk_name_nl":[],"hk_name_split_nl":[],"hk_recombined_nl"
:[],"hk_manual_nl":[],"hk_manual_final_nl":[],"hk_adjectiv_nl":[],
"hk_singplur_nl":[],"hk_synonym_nl":[],"hk_replaced_nl":[],
"hk_adjectiv_enh_nl":[],"hk_sea_enh_nl":[],"name_de":"only_manual_drehteile"
,"id":"95"},
"highlight" : {
"hk_manual_de" : [ "drehteile" ]
},
"_explanation" : {
"value" : 0.049284987,
"description" : "sum of:",
"details" : [ {
"value" : 0.049284987,
"description" : "weight(hk_manual_de:drehteile in 1), product of:"
,
"details" : [ {
"value" : 0.049284987,
"description" : "queryWeight(hk_manual_de:drehteile), product
of:",
"details" : [ {
"value" : 1.0,
"description" : "idf(docFreq=1, maxDocs=2)"
}, {
"value" : 0.049284987,
"description" : "queryNorm"
} ]
}, {
"value" : 1.0,
"description" : "fieldWeight(hk_manual_de:drehteile in 1),
product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(termFreq(hk_manual_de:drehteile)=1)"
}, {
"value" : 1.0,
"description" : "idf(docFreq=1, maxDocs=2)"
}, {
"value" : 1.0,
"description" : "fieldNorm(field=hk_manual_de, doc=1)"
} ]
} ]
} ]
}
}, {
"_shard" : 2,
"_node" : "dHiSD_vaR46-UaWAjI71JQ",
"_index" : "test-headings",
"_type" : "heading",
"_id" : "94",
"_score" : 0.03306274, "_source" : {"company_counts":{"DE":{"DE":100,
"AT":10,"CH":1},"AT":{"DE":2,"AT":50,"CH":10},"CH":{"DE":1,"AT":5,"CH":30}},
"hk_name_de":[],"hk_name_split_de":["drehteile"],"hk_recombined_de":[],
"hk_manual_de":[],"hk_manual_final_de":[],"hk_adjectiv_de":[],
"hk_singplur_de":[],"hk_synonym_de":[],"hk_replaced_de":[],
"hk_adjectiv_enh_de":[],"hk_sea_enh_de":[],"hk_name_en":[],
"hk_name_split_en":[],"hk_recombined_en":[],"hk_manual_en":[],
"hk_manual_final_en":[],"hk_adjectiv_en":[],"hk_singplur_en":[],
"hk_synonym_en":[],"hk_replaced_en":[],"hk_adjectiv_enh_en":[],
"hk_sea_enh_en":[],"hk_name_fr":[],"hk_name_split_fr":[],"hk_recombined_fr"
:[],"hk_manual_fr":[],"hk_manual_final_fr":[],"hk_adjectiv_fr":[],
"hk_singplur_fr":[],"hk_synonym_fr":[],"hk_replaced_fr":[],
"hk_adjectiv_enh_fr":[],"hk_sea_enh_fr":[],"hk_name_nl":[],
"hk_name_split_nl":[],"hk_recombined_nl":[],"hk_manual_nl":[],
"hk_manual_final_nl":[],"hk_adjectiv_nl":[],"hk_singplur_nl":[],
"hk_synonym_nl":[],"hk_replaced_nl":[],"hk_adjectiv_enh_nl":[],
"hk_sea_enh_nl":[],"name_de":"only_split_drehteile","id":"94"},
"highlight" : {
"hk_name_split_de" : [ "drehteile" ]
},
"_explanation" : {
"value" : 0.03306274,
"description" : "sum of:",
"details" : [ {
"value" : 0.03306274,
"description" : "weight(hk_name_split_de:drehteile^4.0 in 0),
product of:",
"details" : [ {
"value" : 0.10774788,
"description" : "queryWeight(hk_name_split_de:drehteile^4.0),
product of:",
"details" : [ {
"value" : 4.0,
"description" : "boost"
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 0.08778466,
"description" : "queryNorm"
} ]
}, {
"value" : 0.30685282,
"description" : "fieldWeight(hk_name_split_de:drehteile in 0),
product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(termFreq(hk_name_split_de:drehteile)=1)"
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 1.0,
"description" : "fieldNorm(field=hk_name_split_de, doc=0)"
} ]
} ]
} ]
}
}

i was expecting hk_name_split_de to be scored higher than hk_manual_de:

weight(hk_manual_de:drehteile in 1) vs. weight(hk_name_split_de:drehteile^4.0
in 0)

any pointers why this is not working the way i was expecting?

--


(Robert Moszczynski) #2

Hi Peter,

I don't think this is an issue. You have to look at your inverted document frequency factor.

An index is not a database and the boosting is not the only factor to determine the scoring between the query and the document. The ^X syntax is used to set a dynamic boosting to a field in query time. Additionally you can use term boosting and document boosting. The boosting values you can find in the mapping are multiplied with other boostings, also the boosting in query time.

Lucene uses a scoring formula to compute the scoring value. This is a function for a query you specified and a index document. You can read about the default similarity here:

http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/Similarity.html

Let us analyze your explanation structure. The score value is a sum of weights for every query term multiplied with the coordination factor (here omitted 1/1=1). You used only one search term 'drehteile' therefore, only one summand. The weight for a term is a product of a queryWeight and fieldWeight for the term. Since the cooridination factor was omitted there are two main factors in the queryWeight. Inverted document frequency describes the rarity of the term in the index corpus - rare terms are scored higher (logarithmic behaviour). The second factor is the queryNorm which is a normalizing factor in the query time. This includes also structural differences (field sets in query and document). In the fieldWeight factor you can see the heart of Lucene scoring, the tf-idf component. The term frequency multiplied by the inverted document frequency describes the main scoring behaviour. Term frequency is the number of times the term appears in the field.

Of course this is not a complete guide through the Lucene Similarty, but shows you the idea behind the similarity formula.

Dismax you can use when you need run more queries and control the computation of the scores of the queries into one value. With tie_breaker you can controll the level of scores should be included in the calculation.

With omit_norms (indexing time - you should put this into your mapping) you can turn off for example the field length. Lucene uses the field length (number of terms in field) by default to calculate a additional relevance factor.

Each of the settings can be helpful or harmful depending on how you have modeled your relevance.

Robert


(Woody Peterson) #3

I recently ran into what looks to be a similar issue while testing. As
Robert pointed out, your inverse document frequency numbers are messing
with your results, BUT more specifically it's due to your low document
numbers interacting with what is probably a default shard number of 5 in
your setup. Since each shard is its own lucene instance, if you have say 6
documents in elasticsearch, one shard (lucene instance) will have 2 total
documents, while the other shards will have 1 total. Thus, when the
2-document shard gets a hit on 1 document for a term, it will think it is
relatively more rare of a term than the 1-document shards where the term
appears in 100% of the documents (1 out of 1).

The solution to eliminating huge discrepancies in idf for low document
numbers is to set number_of_shards to 1 (possibly number_of_replicas to 1
also, unsure how replicas come in to play here).

On Thursday, August 30, 2012 3:39:07 AM UTC-7, Peter Schröder wrote:

i am trying to debug a scoring issue, but i did not find a proper guide to
how one would read the explain results.

there is a query that has custom field boosts:

curl -X GET "
http://localhost:9200/test-headings/heading/_search?from=0&load=false&page=1&per_page=75&raw_hits=true&size=75&pretty=true
" -d
'{"from":0,"size":75,"query":{"query_string":{"query":"drehteile","fields":["hk_name_de^10","hk_name_split_de^4","hk_recombined_de","hk_manual_de","hk_manual_final_de^0.1","hk_adjectiv_de","hk_singplur_de","hk_synonym_de","hk_replaced_de","hk_adjectiv_enh_de","hk_sea_enh_de","hk_click_keywords_de^4.5"],"use_dis_max":false}},"filter":{"numeric_range":{"
company_counts.DE.DE":{"gt":0}}}}'

i thought that the results would be sorted according to the ^Xmodifieres like:

  1. hk_name_de => 10
  2. hk_click_keywords_de => 4.5
  3. hk_name_split_de => 4
  4. rest_de
  5. hk_manual_final_de => 0.1

unfortunately the results are not ordered that way. i tried playing around
with dismax settings, omit_norms etc, which had no effect on what i was
doing. my test-data just has one keyword, so the number or length should
not matter anyways.

the explain-query that i get has the following part:

{
  "_shard" : 3,
  "_node" : "dHiSD_vaR46-UaWAjI71JQ",
  "_index" : "test-headings",
  "_type" : "heading",
  "_id" : "95",
  "_score" : 0.049284987, "_source" : {"company_counts":{"DE":{"DE":

100,"AT":10,"CH":1},"AT":{"DE":2,"AT":50,"CH":10},"CH":{"DE":1,"AT":5,"CH"
:30}},"hk_name_de":[],"hk_name_split_de":[],"hk_recombined_de":[],
"hk_manual_de":["drehteile"],"hk_manual_final_de":[],"hk_adjectiv_de":[],
"hk_singplur_de":[],"hk_synonym_de":[],"hk_replaced_de":[],
"hk_adjectiv_enh_de":[],"hk_sea_enh_de":[],"hk_name_en":[],
"hk_name_split_en":[],"hk_recombined_en":[],"hk_manual_en":[],
"hk_manual_final_en":[],"hk_adjectiv_en":[],"hk_singplur_en":[],
"hk_synonym_en":[],"hk_replaced_en":[],"hk_adjectiv_enh_en":[],
"hk_sea_enh_en":[],"hk_name_fr":[],"hk_name_split_fr":[],
"hk_recombined_fr":[],"hk_manual_fr":[],"hk_manual_final_fr":[],
"hk_adjectiv_fr":[],"hk_singplur_fr":[],"hk_synonym_fr":[],
"hk_replaced_fr":[],"hk_adjectiv_enh_fr":[],"hk_sea_enh_fr":[],
"hk_name_nl":[],"hk_name_split_nl":[],"hk_recombined_nl":[],"hk_manual_nl"
:[],"hk_manual_final_nl":[],"hk_adjectiv_nl":[],"hk_singplur_nl":[],
"hk_synonym_nl":[],"hk_replaced_nl":[],"hk_adjectiv_enh_nl":[],
"hk_sea_enh_nl":[],"name_de":"only_manual_drehteile","id":"95"},
"highlight" : {
"hk_manual_de" : [ "drehteile" ]
},
"_explanation" : {
"value" : 0.049284987,
"description" : "sum of:",
"details" : [ {
"value" : 0.049284987,
"description" : "weight(hk_manual_de:drehteile in 1), product
of:",
"details" : [ {
"value" : 0.049284987,
"description" : "queryWeight(hk_manual_de:drehteile), product
of:",
"details" : [ {
"value" : 1.0,
"description" : "idf(docFreq=1, maxDocs=2)"
}, {
"value" : 0.049284987,
"description" : "queryNorm"
} ]
}, {
"value" : 1.0,
"description" : "fieldWeight(hk_manual_de:drehteile in 1),
product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(termFreq(hk_manual_de:drehteile)=1)"
}, {
"value" : 1.0,
"description" : "idf(docFreq=1, maxDocs=2)"
}, {
"value" : 1.0,
"description" : "fieldNorm(field=hk_manual_de, doc=1)"
} ]
} ]
} ]
}
}, {
"_shard" : 2,
"_node" : "dHiSD_vaR46-UaWAjI71JQ",
"_index" : "test-headings",
"_type" : "heading",
"_id" : "94",
"_score" : 0.03306274, "_source" : {"company_counts":{"DE":{"DE":100
,"AT":10,"CH":1},"AT":{"DE":2,"AT":50,"CH":10},"CH":{"DE":1,"AT":5,"CH":30
}},"hk_name_de":[],"hk_name_split_de":["drehteile"],"hk_recombined_de":[],
"hk_manual_de":[],"hk_manual_final_de":[],"hk_adjectiv_de":[],
"hk_singplur_de":[],"hk_synonym_de":[],"hk_replaced_de":[],
"hk_adjectiv_enh_de":[],"hk_sea_enh_de":[],"hk_name_en":[],
"hk_name_split_en":[],"hk_recombined_en":[],"hk_manual_en":[],
"hk_manual_final_en":[],"hk_adjectiv_en":[],"hk_singplur_en":[],
"hk_synonym_en":[],"hk_replaced_en":[],"hk_adjectiv_enh_en":[],
"hk_sea_enh_en":[],"hk_name_fr":[],"hk_name_split_fr":[],
"hk_recombined_fr":[],"hk_manual_fr":[],"hk_manual_final_fr":[],
"hk_adjectiv_fr":[],"hk_singplur_fr":[],"hk_synonym_fr":[],
"hk_replaced_fr":[],"hk_adjectiv_enh_fr":[],"hk_sea_enh_fr":[],
"hk_name_nl":[],"hk_name_split_nl":[],"hk_recombined_nl":[],"hk_manual_nl"
:[],"hk_manual_final_nl":[],"hk_adjectiv_nl":[],"hk_singplur_nl":[],
"hk_synonym_nl":[],"hk_replaced_nl":[],"hk_adjectiv_enh_nl":[],
"hk_sea_enh_nl":[],"name_de":"only_split_drehteile","id":"94"},
"highlight" : {
"hk_name_split_de" : [ "drehteile" ]
},
"_explanation" : {
"value" : 0.03306274,
"description" : "sum of:",
"details" : [ {
"value" : 0.03306274,
"description" : "weight(hk_name_split_de:drehteile^4.0 in 0),
product of:",
"details" : [ {
"value" : 0.10774788,
"description" : "queryWeight(hk_name_split_de:drehteile^4.0),
product of:",
"details" : [ {
"value" : 4.0,
"description" : "boost"
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 0.08778466,
"description" : "queryNorm"
} ]
}, {
"value" : 0.30685282,
"description" : "fieldWeight(hk_name_split_de:drehteile in
0), product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(termFreq(hk_name_split_de:drehteile)=1)"
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 1.0,
"description" : "fieldNorm(field=hk_name_split_de, doc=0)"
} ]
} ]
} ]
}
}

i was expecting hk_name_split_de to be scored higher than hk_manual_de:

weight(hk_manual_de:drehteile in 1) vs. weight(hk_name_split_de:drehteile^4.0
in 0)

any pointers why this is not working the way i was expecting?

--


(BillyEm) #4

ouch!

On Friday, September 14, 2012 2:36:44 PM UTC-4, Woody Peterson wrote:

I recently ran into what looks to be a similar issue while testing. As
Robert pointed out, your inverse document frequency numbers are messing
with your results, BUT more specifically it's due to your low document
numbers interacting with what is probably a default shard number of 5 in
your setup. Since each shard is its own lucene instance, if you have say 6
documents in elasticsearch, one shard (lucene instance) will have 2 total
documents, while the other shards will have 1 total. Thus, when the
2-document shard gets a hit on 1 document for a term, it will think it is
relatively more rare of a term than the 1-document shards where the term
appears in 100% of the documents (1 out of 1).

The solution to eliminating huge discrepancies in idf for low document
numbers is to set number_of_shards to 1 (possibly number_of_replicas to 1
also, unsure how replicas come in to play here).

On Thursday, August 30, 2012 3:39:07 AM UTC-7, Peter Schröder wrote:

i am trying to debug a scoring issue, but i did not find a proper guide
to how one would read the explain results.

there is a query that has custom field boosts:

curl -X GET "
http://localhost:9200/test-headings/heading/_search?from=0&load=false&page=1&per_page=75&raw_hits=true&size=75&pretty=true
" -d
'{"from":0,"size":75,"query":{"query_string":{"query":"drehteile","fields":["hk_name_de^10","hk_name_split_de^4","hk_recombined_de","hk_manual_de","hk_manual_final_de^0.1","hk_adjectiv_de","hk_singplur_de","hk_synonym_de","hk_replaced_de","hk_adjectiv_enh_de","hk_sea_enh_de","hk_click_keywords_de^4.5"],"use_dis_max":false}},"filter":{"numeric_range":{"
company_counts.DE.DE":{"gt":0}}}}'

i thought that the results would be sorted according to the ^Xmodifieres like:

  1. hk_name_de => 10
  2. hk_click_keywords_de => 4.5
  3. hk_name_split_de => 4
  4. rest_de
  5. hk_manual_final_de => 0.1

unfortunately the results are not ordered that way. i tried playing
around with dismax settings, omit_norms etc, which had no effect on what i
was doing. my test-data just has one keyword, so the number or length
should not matter anyways.

the explain-query that i get has the following part:

{
  "_shard" : 3,
  "_node" : "dHiSD_vaR46-UaWAjI71JQ",
  "_index" : "test-headings",
  "_type" : "heading",
  "_id" : "95",
  "_score" : 0.049284987, "_source" : {"company_counts":{"DE":{"DE":

100,"AT":10,"CH":1},"AT":{"DE":2,"AT":50,"CH":10},"CH":{"DE":1,"AT":5,
"CH":30}},"hk_name_de":[],"hk_name_split_de":[],"hk_recombined_de":[],
"hk_manual_de":["drehteile"],"hk_manual_final_de":[],"hk_adjectiv_de":[],
"hk_singplur_de":[],"hk_synonym_de":[],"hk_replaced_de":[],
"hk_adjectiv_enh_de":[],"hk_sea_enh_de":[],"hk_name_en":[],
"hk_name_split_en":[],"hk_recombined_en":[],"hk_manual_en":[],
"hk_manual_final_en":[],"hk_adjectiv_en":[],"hk_singplur_en":[],
"hk_synonym_en":[],"hk_replaced_en":[],"hk_adjectiv_enh_en":[],
"hk_sea_enh_en":[],"hk_name_fr":[],"hk_name_split_fr":[],
"hk_recombined_fr":[],"hk_manual_fr":[],"hk_manual_final_fr":[],
"hk_adjectiv_fr":[],"hk_singplur_fr":[],"hk_synonym_fr":[],
"hk_replaced_fr":[],"hk_adjectiv_enh_fr":[],"hk_sea_enh_fr":[],
"hk_name_nl":[],"hk_name_split_nl":[],"hk_recombined_nl":[],
"hk_manual_nl":[],"hk_manual_final_nl":[],"hk_adjectiv_nl":[],
"hk_singplur_nl":[],"hk_synonym_nl":[],"hk_replaced_nl":[],
"hk_adjectiv_enh_nl":[],"hk_sea_enh_nl":[],"name_de":
"only_manual_drehteile","id":"95"},
"highlight" : {
"hk_manual_de" : [ "drehteile" ]
},
"_explanation" : {
"value" : 0.049284987,
"description" : "sum of:",
"details" : [ {
"value" : 0.049284987,
"description" : "weight(hk_manual_de:drehteile in 1), product
of:",
"details" : [ {
"value" : 0.049284987,
"description" : "queryWeight(hk_manual_de:drehteile),
product of:",
"details" : [ {
"value" : 1.0,
"description" : "idf(docFreq=1, maxDocs=2)"
}, {
"value" : 0.049284987,
"description" : "queryNorm"
} ]
}, {
"value" : 1.0,
"description" : "fieldWeight(hk_manual_de:drehteile in 1),
product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(termFreq(hk_manual_de:drehteile)=1)"
}, {
"value" : 1.0,
"description" : "idf(docFreq=1, maxDocs=2)"
}, {
"value" : 1.0,
"description" : "fieldNorm(field=hk_manual_de, doc=1)"
} ]
} ]
} ]
}
}, {
"_shard" : 2,
"_node" : "dHiSD_vaR46-UaWAjI71JQ",
"_index" : "test-headings",
"_type" : "heading",
"_id" : "94",
"_score" : 0.03306274, "_source" : {"company_counts":{"DE":{"DE":
100,"AT":10,"CH":1},"AT":{"DE":2,"AT":50,"CH":10},"CH":{"DE":1,"AT":5,
"CH":30}},"hk_name_de":[],"hk_name_split_de":["drehteile"],
"hk_recombined_de":[],"hk_manual_de":[],"hk_manual_final_de":[],
"hk_adjectiv_de":[],"hk_singplur_de":[],"hk_synonym_de":[],
"hk_replaced_de":[],"hk_adjectiv_enh_de":[],"hk_sea_enh_de":[],
"hk_name_en":[],"hk_name_split_en":[],"hk_recombined_en":[],
"hk_manual_en":[],"hk_manual_final_en":[],"hk_adjectiv_en":[],
"hk_singplur_en":[],"hk_synonym_en":[],"hk_replaced_en":[],
"hk_adjectiv_enh_en":[],"hk_sea_enh_en":[],"hk_name_fr":[],
"hk_name_split_fr":[],"hk_recombined_fr":[],"hk_manual_fr":[],
"hk_manual_final_fr":[],"hk_adjectiv_fr":[],"hk_singplur_fr":[],
"hk_synonym_fr":[],"hk_replaced_fr":[],"hk_adjectiv_enh_fr":[],
"hk_sea_enh_fr":[],"hk_name_nl":[],"hk_name_split_nl":[],
"hk_recombined_nl":[],"hk_manual_nl":[],"hk_manual_final_nl":[],
"hk_adjectiv_nl":[],"hk_singplur_nl":[],"hk_synonym_nl":[],
"hk_replaced_nl":[],"hk_adjectiv_enh_nl":[],"hk_sea_enh_nl":[],"name_de":
"only_split_drehteile","id":"94"},
"highlight" : {
"hk_name_split_de" : [ "drehteile" ]
},
"_explanation" : {
"value" : 0.03306274,
"description" : "sum of:",
"details" : [ {
"value" : 0.03306274,
"description" : "weight(hk_name_split_de:drehteile^4.0 in 0),
product of:",
"details" : [ {
"value" : 0.10774788,
"description" : "queryWeight(hk_name_split_de:drehteile^4.0),
product of:",
"details" : [ {
"value" : 4.0,
"description" : "boost"
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 0.08778466,
"description" : "queryNorm"
} ]
}, {
"value" : 0.30685282,
"description" : "fieldWeight(hk_name_split_de:drehteile in
0), product of:",
"details" : [ {
"value" : 1.0,
"description" :
"tf(termFreq(hk_name_split_de:drehteile)=1)"
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 1.0,
"description" : "fieldNorm(field=hk_name_split_de, doc=0)"
} ]
} ]
} ]
}
}

i was expecting hk_name_split_de to be scored higher than hk_manual_de:

weight(hk_manual_de:drehteile in 1) vs. weight(hk_name_split_de:drehteile^4.0
in 0)

any pointers why this is not working the way i was expecting?

--


(Lukáš Vlček) #5

Have you tried different search types?
http://www.elasticsearch.org/guide/reference/api/search/search-type.html

Regards,
Lukas

On Sat, Sep 15, 2012 at 6:03 AM, BillyEm wmartinusa@gmail.com wrote:

ouch!

On Friday, September 14, 2012 2:36:44 PM UTC-4, Woody Peterson wrote:

I recently ran into what looks to be a similar issue while testing. As
Robert pointed out, your inverse document frequency numbers are messing
with your results, BUT more specifically it's due to your low document
numbers interacting with what is probably a default shard number of 5 in
your setup. Since each shard is its own lucene instance, if you have say 6
documents in elasticsearch, one shard (lucene instance) will have 2 total
documents, while the other shards will have 1 total. Thus, when the
2-document shard gets a hit on 1 document for a term, it will think it is
relatively more rare of a term than the 1-document shards where the term
appears in 100% of the documents (1 out of 1).

The solution to eliminating huge discrepancies in idf for low document
numbers is to set number_of_shards to 1 (possibly number_of_replicas to 1
also, unsure how replicas come in to play here).

On Thursday, August 30, 2012 3:39:07 AM UTC-7, Peter Schröder wrote:

i am trying to debug a scoring issue, but i did not find a proper guide
to how one would read the explain results.

there is a query that has custom field boosts:

curl -X GET "http://localhost:9200/test-**headings/heading/search?from=
**0&load=false&page=1&per_page=75&raw_hits=true&size=75&pretty=truehttp://localhost:9200/test-headings/heading/_search?from=0&load=false&page=1&per_page=75&raw_hits=true&size=75&pretty=true
" -d '{"from":0,"size":75,"query":{
"query_string":{"query":"

drehteile","fields":["hk_name
de^10","hk_name_split_de^4","
hk_recombined_de","hk_manual_de","hk_manual_final_de^0.1","
hk_adjectiv_de","hk_singplur_de","hk_synonym_de","hk_
replaced_de","hk_adjectiv_enh_de","hk_sea_enh_de","hk_click_
keywords_de^4.5"],"use_dis_**max":false}},"filter":{"**numeric_range":{"
company_**counts.DE.DE http://company_counts.DE.DE":{"gt":0}}}}'

i thought that the results would be sorted according to the ^Xmodifieres like:

  1. hk_name_de => 10
  2. hk_click_keywords_de => 4.5
  3. hk_name_split_de => 4
  4. rest_de
  5. hk_manual_final_de => 0.1

unfortunately the results are not ordered that way. i tried playing
around with dismax settings, omit_norms etc, which had no effect on what i
was doing. my test-data just has one keyword, so the number or length
should not matter anyways.

the explain-query that i get has the following part:

{
  "_shard" : 3,
  "_node" : "dHiSD_vaR46-UaWAjI71JQ",
  "_index" : "test-headings",
  "_type" : "heading",
  "_id" : "95",
  "_score" : 0.049284987, "_source" : {"company_counts":{"DE":{"DE":

100,"AT":10,"CH":1},"AT":{"DE":2,"AT":50,"CH":10},"CH":{"DE"**:1,
"AT":5,"CH":30}},"hk_name_de":[],"hk_name_split_de":[],"
hk_recombined_de":[],"hk_manual_de":["drehteile"],"hk_
manual_final_de":[],"hk_**adjectiv_de":[],"hk_singplur_**de":[],
"hk_synonym_de":[],"hk_**replaced_de":[],"hk_adjectiv_enh_de":[],
"hk_sea_enh_de":[],
"hk_name_en":[],"hk_name_**split_en":[],
"hk_recombined_**en":[],"hk_manual_en":[],"hk_**manual_final_en":[],"hk_
**adjectiv_en":[],"hk_singplur_en":[],"hk_synonym_en":[],"hk_
replaced_en":[],"hk_adjectiv_enh_en":[],"hk_sea_enh_en":[],
"hk_name_fr":[],"hk_name_**split_fr":[],"hk_recombined_**fr":[],
"hk_manual_fr":[],"hk_**manual_final_fr":[],"hk_**adjectiv_fr":[],
"hk_singplur_**fr":[],"hk_synonym_fr":[],"hk_**replaced_fr":[],
"hk_adjectiv_enh_fr":[],"hk_sea_enh_fr":[],"hk_name_nl":[],"hk_name_
**split_nl":[],"hk_recombined_nl":[],"hk_manual_nl":[],"hk_
manual_final_nl":[],"hk_**adjectiv_nl":[],"hk_singplur_**nl":[],
"hk_synonym_nl":[],"hk_**replaced_nl":[],"hk_adjectiv_enh_nl":[],
"hk_sea_enh_nl":[],
"name_de":"only_manual_**drehteile","id":"95"},
"highlight" : {
"hk_manual_de" : [ "drehteile" ]
},
"explanation" : {
"value" : 0.049284987,
"description" : "sum of:",
"details" : [ {
"value" : 0.049284987,
"description" : "weight(hk_manual_de:drehteile in 1), product
of:",
"details" : [ {
"value" : 0.049284987,
"description" : "queryWeight(hk_manual_de:*drehteile),
product of:",
"details" : [ {
"value" : 1.0,
"description" : "idf(docFreq=1, maxDocs=2)"
}, {
"value" : 0.049284987,
"description" : "queryNorm"
} ]
}, {
"value" : 1.0,
"description" : "fieldWeight(hk_manual_de:drehteile in
1), product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(termFreq(hk_manual_de:drehteile)=1)"
}, {
"value" : 1.0,
"description" : "idf(docFreq=1, maxDocs=2)"
}, {
"value" : 1.0,
"description" : "fieldNorm(field=hk_manual_de, doc=1)"
} ]
} ]
} ]
}
}, {
"_shard" : 2,
"_node" : "dHiSD_vaR46-UaWAjI71JQ",
"_index" : "test-headings",
"_type" : "heading",
"_id" : "94",
"_score" : 0.03306274, "_source" : {"company_counts":{"DE":{"DE":

*100,"AT":10,"CH":1},"AT":{"DE"
:2,"AT":50,"CH":10},"CH":{"DE"
:1,"AT"
:5,"CH":30}},"hk_name
**de":[],"hk_name_split_de":["drehteile"],
"hk_recombined_de"
:[],"hk_manual_de":[],"hk_**manual_final_de":[],"hk_
**adjectiv_de":[],"hk_singplur_de":[],"hk_synonym_de":[],"hk_
replaced_de":[],"hk_adjectiv_enh_de":[],"hk_sea_enh_de":[],
"hk_name_en":[],"hk_name_**split_en":[],"hk_recombined_**en":[],
"hk_manual_en":[],"hk_**manual_final_en":[],"hk_**adjectiv_en":[],
"hk_singplur_**en":[],"hk_synonym_en":[],"hk_**replaced_en":[],
"hk_adjectiv_enh_en":[],"hk_sea_enh_en":[],"hk_name_fr":[],"hk_name_
**split_fr":[],"hk_recombined_fr":[],"hk_manual_fr":[],"hk_
manual_final_fr":[],"hk_**adjectiv_fr":[],"hk_singplur_**fr":[],
"hk_synonym_fr":[],"hk_**replaced_fr":[],"hk_adjectiv_enh_fr":[],
"hk_sea_enh_fr":[],
"hk_name_nl":[],"hk_name_**split_nl":[],
"hk_recombined_**nl":[],"hk_manual_nl":[],"hk_**manual_final_nl":[],"hk_
**adjectiv_nl":[],"hk_singplur_nl":[],"hk_synonym_nl":[],"hk_
replaced_nl":[],"hk_adjectiv_enh_nl":[],"hk_sea_enh_nl":[],"name_de"
:"only_split_**drehteile","id":"94"},
"highlight" : {
"hk_name_split_de" : [ "drehteile" ]
},
"explanation" : {
"value" : 0.03306274,
"description" : "sum of:",
"details" : [ {
"value" : 0.03306274,
"description" : "weight(hk_name_split_de:**drehteile^4.0 in
0), product of:",
"details" : [ {
"value" : 0.10774788,
"description" : "queryWeight(hk_name_split_de:**drehteile^4.0),
product of:",
"details" : [ {
"value" : 4.0,
"description" : "boost"
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 0.08778466,
"description" : "queryNorm"
} ]
}, {
"value" : 0.30685282,
"description" : "fieldWeight(hk_name_split_de:drehteile
in 0), product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(termFreq(hk_name_split_de:

drehteile)=1)"
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 1.0,
"description" : "fieldNorm(field=hk_name
**split_de,
doc=0)"
} ]
} ]
} ]
}
}

i was expecting hk_name_split_de to be scored higher than hk_manual_de:

weight(hk_manual_de:drehteile in 1) vs. weight(hk_name_split_**de:drehteile^4.0
in 0)

any pointers why this is not working the way i was expecting?

--

--


(system) #6