How scoring computed in wildcard and prefix query

cyrilforce · April 11, 2014, 3:45am

Hi,

I have a question on how the scoring being computed on the following query
:

{
"from" : 0,
"size" : 60,
"explain" : true,
"track_scores" : true,
"query" : {
"bool" : {
"should" : [
{ "prefix": { "DISPLAY_NAME" : { "value" : "hap",
"rewrite" : "top_terms_10", "boost" : "3.0" }}},
{ "prefix": {"PERFORMER" :{ "value" : "hap" }}}

            ]
    }

}
}

and it produces result :

"DISPLAY_NAME": "Happier?",
, "_explanation": {
"value": 2.7100196,
"description": "product of:",
"details": [
{
"value": 5.420039,
"description": "sum of:",
"details": [
{
"value": 5.420039,
"description": "sum of:",
"details": [
{
"value": 5.420039,
"description":
"weight(DISPLAY_NAME:happier^3.0 in 32661) [PerFieldSimilarity], result
of:",
"details": [
{
"value": 5.420039,
"description":
"score(doc=32661,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value":
0.34746242,
"description":
"queryWeight, product of:",
"details": [
{

"value": 3,

"description": "boost"
},
{
*
"value": 15.598923,*
*
"description": "idf(docFreq=2, maxDocs=6566786)"*
},
{
*
"value": 0.0074249236,*
*
"description": "queryNorm"*
}
]
},
{
"value":
15.598923,
"description":
"fieldWeight in 32661, product of:",
"details": [
{

"value": 1,

"description": "tf(freq=1.0), with freq of:",

"details": [
{

"value": 1,

"description": "termFreq=1.0"
}
]
},
{
*
"value": 15.598923,*
*
"description": "idf(docFreq=2, maxDocs=6566786)"*
},
{

"value": 1,

"description": "fieldNorm(doc=32661)"
}
]
}
]
}
]
}
]
}
]
},
{
"value": 0.5,
"description": "coord(1/2)"
}

"DISPLAY_NAME": "Happenings",
,
"_explanation": {
"value": 2.5354335,
"description": "product of:",
"details": [
{
"value": 5.070867,
"description": "sum of:",
"details": [
{
"value": 5.070867,
"description": "sum of:",
"details": [
{
"value": 5.070867,
"description":
"weight(DISPLAY_NAME:happenings^3.0 in 23093) [PerFieldSimilarity], result
of:",
"details": [
{
"value": 5.070867,
"description":
"score(doc=23093,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value":
0.33608392,
"description": "
queryWeight, product of:",
"details": [
{

"value": 3,

"description": "boost"
},
{
"value":
15.088098,
*
"description": "idf(docFreq=4, maxDocs=6566786)"*
},
{
*
"value": 0.0074249236,*
*
"description": "queryNorm"*
}
]
},
{
"value":
15.088098,
"description": "*fieldWeight
*in 23093, product of:",
"details": [
{

"value": 1,

"description": "tf(freq=1.0), with freq of:",

"details": [
{

"value": 1,

"description": "termFreq=1.0"
}
]
},
{
*
"value": 15.088098,*
*
"description": "idf(docFreq=4, maxDocs=6566786)"*
},
{

"value": 1,

"description": "fieldNorm(doc=23093)"
}
]
}
]
}
]
}
]
}
]
},
{
"value": 0.5,
"description": "coord(1/2)"
}
]
}
}

As both of the display name in the documents matched "Hap" it should have
same scoring however it yields different scoring as shown above. Further
inspection on the explaining i found
out that the different is in the queryWeight->idf and fieldWeight->idf
fields :

- "value": 15.598923,*

"description": "idf(docFreq=2, maxDocs=6566786)"*

2) "value": 15.088098,

    "description": "idf(docFreq=4, maxDocs=6566786)"*

I would like to know why the value is different and how this is being
computed and what is docFreq ? Also i would like to know what is
queryWeight as when i use wildcard and prefix query it only will computed
the score with queryWeight otherwise only fieldWeight.

I am using &search_type=dfs_query_then_fetch&preference=_primary in the
query.

And here is the gist for full result :

gist.github.com

https://gist.github.com/cheehoo/10439849

prefix_and_wildcard_scores.json

{
    "took": 28,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 244,

This file has been truncated. show original

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/69cb2ec6-720a-42cf-9de4-36bd81d7ad69%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dantuff · April 11, 2014, 5:25am

The scoring is computed using the Lucene scoring:

https://lucene.apache.org/core/3_6_2/api/all/org/apache/lucene/search/Similarity.html

the idf is the inverse document frequency, which gives a higher score to
the rarer terms in the index. The term 'Happenings' appears four times in
your index, the term 'Happier' in appears twice your index therefore it has
a higher score for idf.

Dan

On Friday, April 11, 2014 4:45:58 AM UTC+1, cyrilforce wrote:

Hi,

I have a question on how the scoring being computed on the following query
:

{
"from" : 0,
"size" : 60,
"explain" : true,
"track_scores" : true,
"query" : {
"bool" : {
"should" : [
{ "prefix": { "DISPLAY_NAME" : { "value" : "hap",
"rewrite" : "top_terms_10", "boost" : "3.0" }}},
{ "prefix": {"PERFORMER" :{ "value" : "hap" }}}
            ]
    }
}
}

and it produces result :

"DISPLAY_NAME": "Happier?",
, "_explanation": {
"value": 2.7100196,
"description": "product of:",
"details": [
{
"value": 5.420039,
"description": "sum of:",
"details": [
{
"value": 5.420039,
"description": "sum of:",
"details": [
{
"value": 5.420039,
"description":
"weight(DISPLAY_NAME:happier^3.0 in 32661) [PerFieldSimilarity], result
of:",
"details": [
{
"value": 5.420039,
"description":
"score(doc=32661,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value":
0.34746242,
"description":
"queryWeight, product of:",
"details": [
{

"value": 3,

"description": "boost"
},
{
*
"value": 15.598923,*
*
"description": "idf(docFreq=2, maxDocs=6566786)"*
},
{
*
"value": 0.0074249236,*
*
"description": "queryNorm"*
}
]
},
{
"value":
15.598923,
"description":
"fieldWeight in 32661, product of:",
"details": [
{

"value": 1,

"description": "tf(freq=1.0), with freq of:",

"details": [
{

"value": 1,

"description": "termFreq=1.0"
}
]
},
{
*
"value": 15.598923,*
*
"description": "idf(docFreq=2, maxDocs=6566786)"*
},
{

"value": 1,

"description": "fieldNorm(doc=32661)"
}
]
}
]
}
]
}
]
}
]
},
{
"value": 0.5,
"description": "coord(1/2)"
}

"DISPLAY_NAME": "Happenings",
,
"_explanation": {
"value": 2.5354335,
"description": "product of:",
"details": [
{
"value": 5.070867,
"description": "sum of:",
"details": [
{
"value": 5.070867,
"description": "sum of:",
"details": [
{
"value": 5.070867,
"description":
"weight(DISPLAY_NAME:happenings^3.0 in 23093) [PerFieldSimilarity], result
of:",
"details": [
{
"value": 5.070867,
"description":
"score(doc=23093,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value":
0.33608392,
"description":
"queryWeight, product of:",
"details": [
{

"value": 3,

"description": "boost"
},
{
"value":
15.088098,
*
"description": "idf(docFreq=4, maxDocs=6566786)"*
},
{
*
"value": 0.0074249236,*
*
"description": "queryNorm"*
}
]
},
{
"value":
15.088098,
"description":
"*fieldWeight *in 23093, product of:",
"details": [
{

"value": 1,

"description": "tf(freq=1.0), with freq of:",

"details": [
{

"value": 1,

"description": "termFreq=1.0"
}
]
},
{
*
"value": 15.088098,*
*
"description": "idf(docFreq=4, maxDocs=6566786)"*
},
{

"value": 1,

"description": "fieldNorm(doc=23093)"
}
]
}
]
}
]
}
]
}
]
},
{
"value": 0.5,
"description": "coord(1/2)"
}
]
}
}

As both of the display name in the documents matched "Hap" it should
have same scoring however it yields different scoring as shown above.
Further inspection on the explaining i found
out that the different is in the queryWeight->idf and fieldWeight->idf
fields :

"value": 15.598923,*
"description": "idf(docFreq=2, maxDocs=6566786)"*
2) "value": 15.088098,
    "description": "idf(docFreq=4, maxDocs=6566786)"*
I would like to know why the value is different and how this is being
computed and what is docFreq ? Also i would like to know what is
queryWeight as when i use wildcard and prefix query it only will computed
the score with queryWeight otherwise only fieldWeight.

I am using &search_type=dfs_query_then_fetch&preference=_primary in the
query.

And here is the gist for full result :
prefix_and_wildcard_scores.json · GitHub

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eb221209-ddc9-4257-9a5f-a1f11a39f088%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

cyrilforce · April 11, 2014, 5:52am

Thanks Dan.

How about queryWeight ? It only computed when using wildcard or prefix.
Also when you say " The term 'Happenings' appears four times in your index"
is regardless on which field right ?

Thanks

On Fri, Apr 11, 2014 at 1:25 PM, Dan Tuffery dan.tuffery@gmail.com wrote:

The scoring is computed using the Lucene scoring:

Similarity (Lucene 3.6.2 API)

the idf is the inverse document frequency, which gives a higher score
to the rarer terms in the index. The term 'Happenings' appears four
times in your index, the term 'Happier' in appears twice your index
therefore it has a higher score for idf.

Dan

On Friday, April 11, 2014 4:45:58 AM UTC+1, cyrilforce wrote:
Hi,

I have a question on how the scoring being computed on the following
query :

{
"from" : 0,
"size" : 60,
"explain" : true,
"track_scores" : true,
"query" : {
"bool" : {
"should" : [
{ "prefix": { "DISPLAY_NAME" : { "value" : "hap",
"rewrite" : "top_terms_10", "boost" : "3.0" }}},
{ "prefix": {"PERFORMER" :{ "value" : "hap" }}}
            ]
    }
}
}

and it produces result :

"DISPLAY_NAME": "Happier?",
, "_explanation": {
"value": 2.7100196,
"description": "product of:",
"details": [
{
"value": 5.420039,
"description": "sum of:",
"details": [
{
"value": 5.420039,
"description": "sum of:",
"details": [
{
"value": 5.420039,
"description":
"weight(DISPLAY_NAME:happier^3.0 in 32661) [PerFieldSimilarity], result
of:",
"details": [
{
"value": 5.420039,
"description":
"score(doc=32661,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value":
0.34746242,

"description": "queryWeight, product of:",
"details": [
{

"value": 3,

"description": "boost"
},
{
*
"value": 15.598923,*
*
"description": "idf(docFreq=2, maxDocs=6566786)"*
},
{
*
"value": 0.0074249236,*
*
"description": "queryNorm"*
}
]
},
{
"value":
15.598923,

"description": "fieldWeight in 32661, product of:",
"details": [
{

"value": 1,

"description": "tf(freq=1.0), with freq of:",

"details": [
{

"value": 1,

"description": "termFreq=1.0"
}
]
},
{
*
"value": 15.598923,*
*
"description": "idf(docFreq=2, maxDocs=6566786)"*
},
{

"value": 1,

"description": "fieldNorm(doc=32661)"
}
]
}
]
}
]
}
]
}
]
},
{
"value": 0.5,
"description": "coord(1/2)"
}

"DISPLAY_NAME": "Happenings",
,
"_explanation": {
"value": 2.5354335,
"description": "product of:",
"details": [
{
"value": 5.070867,
"description": "sum of:",
"details": [
{
"value": 5.070867,
"description": "sum of:",
"details": [
{
"value": 5.070867,
"description":
"weight(DISPLAY_NAME:happenings^3.0 in 23093) [PerFieldSimilarity],
result of:",
"details": [
{
"value": 5.070867,
"description":
"score(doc=23093,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value":
0.33608392,

"description": "queryWeight, product of:",
"details": [
{

"value": 3,

"description": "boost"
},
{
"value":
15.088098,
*
"description": "idf(docFreq=4, maxDocs=6566786)"*
},
{
*
"value": 0.0074249236,*
*
"description": "queryNorm"*
}
]
},
{
"value":
15.088098,

"description": "*fieldWeight *in 23093, product of:",
"details": [
{

"value": 1,

"description": "tf(freq=1.0), with freq of:",

"details": [
{

"value": 1,

"description": "termFreq=1.0"
}
]
},
{
*
"value": 15.088098,*
*
"description": "idf(docFreq=4, maxDocs=6566786)"*
},
{

"value": 1,

"description": "fieldNorm(doc=23093)"
}
]
}
]
}
]
}
]
}
]
},
{
"value": 0.5,
"description": "coord(1/2)"
}
]
}
}

As both of the display name in the documents matched "Hap" it should
have same scoring however it yields different scoring as shown above.
Further inspection on the explaining i found
out that the different is in the queryWeight->idf and fieldWeight->idf
fields :

"value": 15.598923,*
"description": "idf(docFreq=2, maxDocs=6566786)"*
2) "value": 15.088098,
    "description": "idf(docFreq=4, maxDocs=6566786)"*
I would like to know why the value is different and how this is being
computed and what is docFreq ? Also i would like to know what is
queryWeight as when i use wildcard and prefix query it only will computed
the score with queryWeight otherwise only fieldWeight.

I am using &search_type=dfs_query_then_fetch&preference=_primary in
the query.

And here is the gist for full result :
prefix_and_wildcard_scores.json · GitHub

Thanks.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/VKgbWgrgzSg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/eb221209-ddc9-4257-9a5f-a1f11a39f088%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/eb221209-ddc9-4257-9a5f-a1f11a39f088%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Regards,

Chee Hoo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg_V7BjQMODP2HqwwVe0ZOhRQgSDTJfANhpi7HBtsvOi%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

cyrilforce · April 11, 2014, 6:09am

Sorry one more point i would like to know. Is there anyway to disable
rarity of the text (docFreq) yield higher scores ?

On Fri, Apr 11, 2014 at 1:52 PM, chee hoo lum cheehoo84@gmail.com wrote:

Thanks Dan.

How about queryWeight ? It only computed when using wildcard or prefix.
Also when you say " The term 'Happenings' appears four times in your
index" is regardless on which field right ?

Thanks

On Fri, Apr 11, 2014 at 1:25 PM, Dan Tuffery dan.tuffery@gmail.comwrote:
The scoring is computed using the Lucene scoring:

Similarity (Lucene 3.6.2 API)

the idf is the inverse document frequency, which gives a higher score
to the rarer terms in the index. The term 'Happenings' appears four
times in your index, the term 'Happier' in appears twice your index
therefore it has a higher score for idf.

Dan

On Friday, April 11, 2014 4:45:58 AM UTC+1, cyrilforce wrote:
Hi,

I have a question on how the scoring being computed on the following
query :

{
"from" : 0,
"size" : 60,
"explain" : true,
"track_scores" : true,
"query" : {
"bool" : {
"should" : [
{ "prefix": { "DISPLAY_NAME" : { "value" : "hap",
"rewrite" : "top_terms_10", "boost" : "3.0" }}},
{ "prefix": {"PERFORMER" :{ "value" : "hap" }}}
            ]
    }
}
}

and it produces result :

"DISPLAY_NAME": "Happier?",
, "_explanation": {
"value": 2.7100196,
"description": "product of:",
"details": [
{
"value": 5.420039,
"description": "sum of:",
"details": [
{
"value": 5.420039,
"description": "sum of:",
"details": [
{
"value": 5.420039,
"description":
"weight(DISPLAY_NAME:happier^3.0 in 32661) [PerFieldSimilarity], result
of:",
"details": [
{
"value": 5.420039,
"description":
"score(doc=32661,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value":
0.34746242,

"description": "queryWeight, product of:",
"details": [
{

"value": 3,

"description": "boost"
},
{
*
"value": 15.598923,*
*
"description": "idf(docFreq=2, maxDocs=6566786)"*
},
{
*
"value": 0.0074249236,*
*
"description": "queryNorm"*
}
]
},
{
"value":
15.598923,

"description": "fieldWeight in 32661, product of:",
"details": [
{

"value": 1,

"description": "tf(freq=1.0), with freq of:",

"details": [
{
"value": 1,

"description": "termFreq=1.0"
                                                                    }
                                                                ]
                                                            },
                                                            {
                                                             *
"value": 15.598923,*
*
"description": "idf(docFreq=2, maxDocs=6566786)"*
},
{

"value": 1,

"description": "fieldNorm(doc=32661)"
}
]
}
]
}
]
}
]
}
]
},
{
"value": 0.5,
"description": "coord(1/2)"
}

"DISPLAY_NAME": "Happenings",
,
"_explanation": {
"value": 2.5354335,
"description": "product of:",
"details": [
{
"value": 5.070867,
"description": "sum of:",
"details": [
{
"value": 5.070867,
"description": "sum of:",
"details": [
{
"value": 5.070867,
"description":
"weight(DISPLAY_NAME:happenings^3.0 in 23093) [PerFieldSimilarity],
result of:",
"details": [
{
"value": 5.070867,
"description":
"score(doc=23093,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value":
0.33608392,

"description": "queryWeight, product of:",
"details": [
{

"value": 3,

"description": "boost"
},
{
"value":
15.088098,
*
"description": "idf(docFreq=4, maxDocs=6566786)"*
},
{
*
"value": 0.0074249236,*
*
"description": "queryNorm"*
}
]
},
{
"value":
15.088098,

"description": "*fieldWeight *in 23093, product of:",
"details": [
{

"value": 1,

"description": "tf(freq=1.0), with freq of:",

"details": [
{
"value": 1,

"description": "termFreq=1.0"
                                                                    }
                                                                ]
                                                            },
                                                            {
                                                            *
"value": 15.088098,*
*
"description": "idf(docFreq=4, maxDocs=6566786)"*
},
{

"value": 1,

"description": "fieldNorm(doc=23093)"
}
]
}
]
}
]
}
]
}
]
},
{
"value": 0.5,
"description": "coord(1/2)"
}
]
}
}

As both of the display name in the documents matched "Hap" it should
have same scoring however it yields different scoring as shown above.
Further inspection on the explaining i found
out that the different is in the queryWeight->idf and fieldWeight->idf
fields :

"value": 15.598923,*
"description": "idf(docFreq=2, maxDocs=6566786)"*
2) "value": 15.088098,
    "description": "idf(docFreq=4, maxDocs=6566786)"*
I would like to know why the value is different and how this is being
computed and what is docFreq ? Also i would like to know what is
queryWeight as when i use wildcard and prefix query it only will computed
the score with queryWeight otherwise only fieldWeight.

I am using &search_type=dfs_query_then_fetch&preference=_primary in
the query.

And here is the gist for full result :
prefix_and_wildcard_scores.json · GitHub

Thanks.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/VKgbWgrgzSg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/eb221209-ddc9-4257-9a5f-a1f11a39f088%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/eb221209-ddc9-4257-9a5f-a1f11a39f088%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
Regards,

Chee Hoo

--
Regards,

Chee Hoo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGS0%2Bg9_CagCGiBOLZfTa2yirkbDHiK7ZGNKe1zFc14kdVxUWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Scoring is not being computed when using wildcard in query_string query Elasticsearch	1	294	July 6, 2017
Query score calculation algorithm for multiple scoring option given in query Elasticsearch	3	503	July 6, 2017
Query_string with wildcard does not calculate score Elasticsearch	4	1811	July 6, 2017
Expecting another result(scoring) on function_score Elasticsearch	2	413	October 23, 2018
Query_string with wildcard does not return an explanation (explain=true) Elasticsearch	1	247	August 7, 2022

How scoring computed in wildcard and prefix query

Related topics