Help me understand how ES calculate the score to match query

I have two documents as follows:

{
"title":"xbox"
}

{
"title":"xbox xbox xbox"
}

Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}

ES returns result as follows:
{"took":133,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[
{"_index":"storetest1","_type":"type","_id":"1","_score":0.30685282,"_source":{"title":"xbox","keywords":["xbox"]}},

{"_index":"storetest1","_type":"type","_id":"2","_score":0.26574233,"_source":{"title":"xbox
xbox xbox","keywords":["xbox"]}}]}}

My question is, why #1 got higher score than #2? I thought #2 is higher
than #1, since more xbox appear in title of #1.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/03afe5b3-0255-4d0d-ba15-0e9c2afbb96e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You can enable explain for your query and see how elasticsearch calculates
score:

{
"explain": true,
"query": {
"match": {
"title": "xbox"
}
}
}

On Tuesday, March 10, 2015 at 3:15:50 PM UTC+7, Xudong You wrote:

I have two documents as follows:

{
"title":"xbox"
}

{
"title":"xbox xbox xbox"
}

Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}

ES returns result as follows:

{"took":133,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[

{"_index":"storetest1","_type":"type","_id":"1","_score":0.30685282,"_source":{"title":"xbox","keywords":["xbox"]}},

{"_index":"storetest1","_type":"type","_id":"2","_score":0.26574233,"_source":{"title":"xbox
xbox xbox","keywords":["xbox"]}}]}}

My question is, why #1 got higher score than #2? I thought #2 is higher
than #1, since more xbox appear in title of #1.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/686a7c57-763a-4824-9fc3-36b0ff6c134b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thanks!
I tried the explain and better understand how the score comes. But still
has question on the IDF score, the IDF in the explain output of my query is:
{
"value": 0.30685282,
"description": "idf(docFreq=1, maxDocs=1)"
}

What does docFreq and maxDocs in above mean? Per the IDF definition, the
score should be affected by the total number of documents in the index, but
seems the value is always 0.30685282 no matter how many docs I inserted to
the index.

On Tuesday, March 10, 2015 at 5:39:56 PM UTC+8, Nhật Quang Phan wrote:

You can enable explain for your query and see how elasticsearch calculates
score:

{
"explain": true,
"query": {
"match": {
"title": "xbox"
}
}
}

On Tuesday, March 10, 2015 at 3:15:50 PM UTC+7, Xudong You wrote:

I have two documents as follows:

{
"title":"xbox"
}

{
"title":"xbox xbox xbox"
}

Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}

ES returns result as follows:

{"took":133,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[

{"_index":"storetest1","_type":"type","_id":"1","_score":0.30685282,"_source":{"title":"xbox","keywords":["xbox"]}},

{"_index":"storetest1","_type":"type","_id":"2","_score":0.26574233,"_source":{"title":"xbox
xbox xbox","keywords":["xbox"]}}]}}

My question is, why #1 got higher score than #2? I thought #2 is higher
than #1, since more xbox appear in title of #1.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

A couple of things are going on here

First read "Why is Relevance Broken". You're IDF might not be changing due
to sharding.
https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-is-broken.html

Second
docFreq reflects this terms actual document frequency (how many documents
does the term occur in)
maxDocs reflects the total number of documents on this shard

Third
maxDocs (and docFreq) do not reflect deletions.

Lastly,
I presume you can find the documents you think you're adding in the index?

Hope that helps
-Doug

On Tue, Mar 10, 2015 at 9:46 PM, Xudong You xudong.you@gmail.com wrote:

Thanks!
I tried the explain and better understand how the score comes. But still
has question on the IDF score, the IDF in the explain output of my query is:
{
"value": 0.30685282,
"description": "idf(docFreq=1, maxDocs=1)"
}

What does docFreq and maxDocs in above mean? Per the IDF definition, the
score should be affected by the total number of documents in the index, but
seems the value is always 0.30685282 no matter how many docs I inserted to
the index.

On Tuesday, March 10, 2015 at 5:39:56 PM UTC+8, Nhật Quang Phan wrote:

You can enable explain for your query and see how elasticsearch
calculates score:

{
"explain": true,
"query": {
"match": {
"title": "xbox"
}
}
}

On Tuesday, March 10, 2015 at 3:15:50 PM UTC+7, Xudong You wrote:

I have two documents as follows:

{
"title":"xbox"
}

{
"title":"xbox xbox xbox"
}

Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}

ES returns result as follows:
{"took":133,"timed_out":false,"_shards":{"total":5,"
successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[
{"_index":"storetest1","_type":"type","_id":"1","_score":0.
30685282,"_source":{"title":"xbox","keywords":["xbox"]}},

{"_index":"storetest1","_type":"type","_id":"2","_score":0.
26574233,"_source":{"title":"xbox xbox xbox","keywords":["xbox"]}}]}}

My question is, why #1 got higher score than #2? I thought #2 is higher
than #1, since more xbox appear in title of #1.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Doug Turnbull
Search Relevance Lead
OpenSource Connections http://o19s.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALG6HL_HaMFh4xh3sscn8w70NbEtiCf%2Bntxwzm811kDsyaAL5A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks a lot!
I now better understand how IDF in ES works, as you said, it is caused by
sharding. After I added enough documents, I do see changes on IDF value as
well as docFreq and maxDocs in output.

On Wednesday, March 11, 2015 at 9:54:13 AM UTC+8, Doug Turnbull wrote:

A couple of things are going on here

First read "Why is Relevance Broken". You're IDF might not be changing due
to sharding.

https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-is-broken.html

Second
docFreq reflects this terms actual document frequency (how many documents
does the term occur in)
maxDocs reflects the total number of documents on this shard

Third
maxDocs (and docFreq) do not reflect deletions.

Lastly,
I presume you can find the documents you think you're adding in the index?

Hope that helps
-Doug

On Tue, Mar 10, 2015 at 9:46 PM, Xudong You <xudon...@gmail.com
<javascript:>> wrote:

Thanks!
I tried the explain and better understand how the score comes. But still
has question on the IDF score, the IDF in the explain output of my query is:
{
"value": 0.30685282,
"description": "idf(docFreq=1, maxDocs=1)"
}

What does docFreq and maxDocs in above mean? Per the IDF definition, the
score should be affected by the total number of documents in the index, but
seems the value is always 0.30685282 no matter how many docs I inserted to
the index.

On Tuesday, March 10, 2015 at 5:39:56 PM UTC+8, Nhật Quang Phan wrote:

You can enable explain for your query and see how elasticsearch
calculates score:

{
"explain": true,
"query": {
"match": {
"title": "xbox"
}
}
}

On Tuesday, March 10, 2015 at 3:15:50 PM UTC+7, Xudong You wrote:

I have two documents as follows:

{
"title":"xbox"
}

{
"title":"xbox xbox xbox"
}

Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}

ES returns result as follows:
{"took":133,"timed_out":false,"_shards":{"total":5,"
successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[
{"_index":"storetest1","_type":"type","_id":"1","_score":0.
30685282,"_source":{"title":"xbox","keywords":["xbox"]}},

{"_index":"storetest1","_type":"type","_id":"2","_score":0.
26574233,"_source":{"title":"xbox xbox xbox","keywords":["xbox"]}}]}}

My question is, why #1 got higher score than #2? I thought #2 is higher
than #1, since more xbox appear in title of #1.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3ab150a6-eccc-4145-a7e9-af16f3ff6752%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Doug Turnbull
Search Relevance Lead
OpenSource Connections http://o19s.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2e18efff-0b66-41b0-98e6-1eb73bde6896%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.