Help me understand how ES calculate the score to match query

Youxu · March 10, 2015, 8:15am

I have two documents as follows:

{
"title":"xbox"
}

{
"title":"xbox xbox xbox"
}

Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}

ES returns result as follows:
{"took":133,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[
{"_index":"storetest1","_type":"type","_id":"1","_score":0.30685282,"_source":{"title":"xbox","keywords":["xbox"]}},

{"_index":"storetest1","_type":"type","_id":"2","_score":0.26574233,"_source":{"title":"xbox
xbox xbox","keywords":["xbox"]}}]}}

My question is, why #1 got higher score than #2? I thought #2 is higher
than #1, since more xbox appear in title of #1.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/03afe5b3-0255-4d0d-ba15-0e9c2afbb96e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nh_t_Quang_Phan · March 10, 2015, 9:39am

You can enable explain for your query and see how elasticsearch calculates
score:

{
"explain": true,
"query": {
"match": {
"title": "xbox"
}
}
}

On Tuesday, March 10, 2015 at 3:15:50 PM UTC+7, Xudong You wrote:

I have two documents as follows:

{
"title":"xbox"
}

{
"title":"xbox xbox xbox"
}

Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}

ES returns result as follows:

{"took":133,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[

{"_index":"storetest1","_type":"type","_id":"1","_score":0.30685282,"_source":{"title":"xbox","keywords":["xbox"]}},

{"_index":"storetest1","_type":"type","_id":"2","_score":0.26574233,"_source":{"title":"xbox
xbox xbox","keywords":["xbox"]}}]}}

My question is, why #1 got higher score than #2? I thought #2 is higher
than #1, since more xbox appear in title of #1.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/686a7c57-763a-4824-9fc3-36b0ff6c134b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Youxu · March 11, 2015, 1:46am

Thanks!
I tried the explain and better understand how the score comes. But still
has question on the IDF score, the IDF in the explain output of my query is:
{
"value": 0.30685282,
"description": "idf(docFreq=1, maxDocs=1)"
}

What does docFreq and maxDocs in above mean? Per the IDF definition, the
score should be affected by the total number of documents in the index, but
seems the value is always 0.30685282 no matter how many docs I inserted to
the index.

On Tuesday, March 10, 2015 at 5:39:56 PM UTC+8, Nhật Quang Phan wrote:

You can enable explain for your query and see how elasticsearch calculates
score:

{
"explain": true,
"query": {
"match": {
"title": "xbox"
}
}
}

On Tuesday, March 10, 2015 at 3:15:50 PM UTC+7, Xudong You wrote:

I have two documents as follows:

{
"title":"xbox"
}

{
"title":"xbox xbox xbox"
}

Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}

ES returns result as follows:

{"took":133,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[

{"_index":"storetest1","_type":"type","_id":"1","_score":0.30685282,"_source":{"title":"xbox","keywords":["xbox"]}},

{"_index":"storetest1","_type":"type","_id":"2","_score":0.26574233,"_source":{"title":"xbox
xbox xbox","keywords":["xbox"]}}]}}

My question is, why #1 got higher score than #2? I thought #2 is higher
than #1, since more xbox appear in title of #1.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

softwaredoug · March 11, 2015, 1:54am

A couple of things are going on here

First read "Why is Relevance Broken". You're IDF might not be changing due
to sharding.

Second
docFreq reflects this terms actual document frequency (how many documents
does the term occur in)
maxDocs reflects the total number of documents on this shard

Third
maxDocs (and docFreq) do not reflect deletions.

Lastly,
I presume you can find the documents you think you're adding in the index?

Hope that helps
-Doug

On Tue, Mar 10, 2015 at 9:46 PM, Xudong You xudong.you@gmail.com wrote:

Thanks!
I tried the explain and better understand how the score comes. But still
has question on the IDF score, the IDF in the explain output of my query is:
{
"value": 0.30685282,
"description": "idf(docFreq=1, maxDocs=1)"
}

What does docFreq and maxDocs in above mean? Per the IDF definition, the
score should be affected by the total number of documents in the index, but
seems the value is always 0.30685282 no matter how many docs I inserted to
the index.

On Tuesday, March 10, 2015 at 5:39:56 PM UTC+8, Nhật Quang Phan wrote:

You can enable explain for your query and see how elasticsearch
calculates score:

{
"explain": true,
"query": {
"match": {
"title": "xbox"
}
}
}

On Tuesday, March 10, 2015 at 3:15:50 PM UTC+7, Xudong You wrote:

I have two documents as follows:

{
"title":"xbox"
}

{
"title":"xbox xbox xbox"
}

Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}

ES returns result as follows:
{"took":133,"timed_out":false,"_shards":{"total":5,"
successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[
{"_index":"storetest1","_type":"type","_id":"1","_score":0.
30685282,"_source":{"title":"xbox","keywords":["xbox"]}},

{"_index":"storetest1","_type":"type","_id":"2","_score":0.
26574233,"_source":{"title":"xbox xbox xbox","keywords":["xbox"]}}]}}

My question is, why #1 got higher score than #2? I thought #2 is higher
than #1, since more xbox appear in title of #1.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Doug Turnbull
Search Relevance Lead
OpenSource Connections http://o19s.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALG6HL_HaMFh4xh3sscn8w70NbEtiCf%2Bntxwzm811kDsyaAL5A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Youxu · March 11, 2015, 5:54am

Thanks a lot!
I now better understand how IDF in ES works, as you said, it is caused by
sharding. After I added enough documents, I do see changes on IDF value as
well as docFreq and maxDocs in output.

On Wednesday, March 11, 2015 at 9:54:13 AM UTC+8, Doug Turnbull wrote:

A couple of things are going on here

First read "Why is Relevance Broken". You're IDF might not be changing due
to sharding.

Relevance Is Broken! | Elasticsearch: The Definitive Guide [2.x] | Elastic

Second
docFreq reflects this terms actual document frequency (how many documents
does the term occur in)
maxDocs reflects the total number of documents on this shard

Third
maxDocs (and docFreq) do not reflect deletions.

Lastly,
I presume you can find the documents you think you're adding in the index?

Hope that helps
-Doug

On Tue, Mar 10, 2015 at 9:46 PM, Xudong You <xudon...@gmail.com
<javascript:>> wrote:

Thanks!
I tried the explain and better understand how the score comes. But still
has question on the IDF score, the IDF in the explain output of my query is:
{
"value": 0.30685282,
"description": "idf(docFreq=1, maxDocs=1)"
}

What does docFreq and maxDocs in above mean? Per the IDF definition, the
score should be affected by the total number of documents in the index, but
seems the value is always 0.30685282 no matter how many docs I inserted to
the index.

On Tuesday, March 10, 2015 at 5:39:56 PM UTC+8, Nhật Quang Phan wrote:

You can enable explain for your query and see how elasticsearch
calculates score:

{
"explain": true,
"query": {
"match": {
"title": "xbox"
}
}
}

On Tuesday, March 10, 2015 at 3:15:50 PM UTC+7, Xudong You wrote:

I have two documents as follows:

{
"title":"xbox"
}

{
"title":"xbox xbox xbox"
}

Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}

ES returns result as follows:
{"took":133,"timed_out":false,"_shards":{"total":5,"
successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[
{"_index":"storetest1","_type":"type","_id":"1","_score":0.
30685282,"_source":{"title":"xbox","keywords":["xbox"]}},

{"_index":"storetest1","_type":"type","_id":"2","_score":0.
26574233,"_source":{"title":"xbox xbox xbox","keywords":["xbox"]}}]}}

My question is, why #1 got higher score than #2? I thought #2 is higher
than #1, since more xbox appear in title of #1.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Doug Turnbull
Search Relevance Lead
OpenSource Connections http://o19s.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2e18efff-0b66-41b0-98e6-1eb73bde6896%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Why the score in Elasticsearch is different if the data is same in two records Elasticsearch	9	1729	July 5, 2017
A bug for the score calculation in Elastic Search Elasticsearch	2	465	October 23, 2017
Array scoring: max instead of sum Elasticsearch	1	829	July 6, 2017
Wrong Scoring using match query on Sense Elasticsearch	2	435	July 6, 2017
Elasticsearch relevance score calculation Elasticsearch	3	2132	April 29, 2019

Help me understand how ES calculate the score to match query

Related topics