Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}
ES returns result as follows:
{"took":133,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[
{"_index":"storetest1","_type":"type","_id":"1","_score":0.30685282,"_source":{"title":"xbox","keywords":["xbox"]}},
{"_index":"storetest1","_type":"type","_id":"2","_score":0.26574233,"_source":{"title":"xbox
xbox xbox","keywords":["xbox"]}}]}}
My question is, why #1 got higher score than #2? I thought #2 is higher
than #1, since more xbox appear in title of #1.
Thanks!
I tried the explain and better understand how the score comes. But still
has question on the IDF score, the IDF in the explain output of my query is:
{
"value": 0.30685282,
"description": "idf(docFreq=1, maxDocs=1)"
}
What does docFreq and maxDocs in above mean? Per the IDF definition, the
score should be affected by the total number of documents in the index, but
seems the value is always 0.30685282 no matter how many docs I inserted to
the index.
On Tuesday, March 10, 2015 at 5:39:56 PM UTC+8, Nhật Quang Phan wrote:
You can enable explain for your query and see how elasticsearch calculates
score:
First read "Why is Relevance Broken". You're IDF might not be changing due
to sharding.
Second
docFreq reflects this terms actual document frequency (how many documents
does the term occur in)
maxDocs reflects the total number of documents on this shard
Third
maxDocs (and docFreq) do not reflect deletions.
Lastly,
I presume you can find the documents you think you're adding in the index?
Thanks!
I tried the explain and better understand how the score comes. But still
has question on the IDF score, the IDF in the explain output of my query is:
{
"value": 0.30685282,
"description": "idf(docFreq=1, maxDocs=1)"
}
What does docFreq and maxDocs in above mean? Per the IDF definition, the
score should be affected by the total number of documents in the index, but
seems the value is always 0.30685282 no matter how many docs I inserted to
the index.
On Tuesday, March 10, 2015 at 5:39:56 PM UTC+8, Nhật Quang Phan wrote:
You can enable explain for your query and see how elasticsearch
calculates score:
On Tuesday, March 10, 2015 at 3:15:50 PM UTC+7, Xudong You wrote:
I have two documents as follows:
{
"title":"xbox"
}
{
"title":"xbox xbox xbox"
}
Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}
ES returns result as follows:
{"took":133,"timed_out":false,"_shards":{"total":5,"
successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[
{"_index":"storetest1","_type":"type","_id":"1","_score":0.
30685282,"_source":{"title":"xbox","keywords":["xbox"]}},
{"_index":"storetest1","_type":"type","_id":"2","_score":0.
26574233,"_source":{"title":"xbox xbox xbox","keywords":["xbox"]}}]}}
My question is, why #1 got higher score than #2? I thought #2 is higher
than #1, since more xbox appear in title of #1.
Thanks a lot!
I now better understand how IDF in ES works, as you said, it is caused by
sharding. After I added enough documents, I do see changes on IDF value as
well as docFreq and maxDocs in output.
On Wednesday, March 11, 2015 at 9:54:13 AM UTC+8, Doug Turnbull wrote:
A couple of things are going on here
First read "Why is Relevance Broken". You're IDF might not be changing due
to sharding.
Second
docFreq reflects this terms actual document frequency (how many documents
does the term occur in)
maxDocs reflects the total number of documents on this shard
Third
maxDocs (and docFreq) do not reflect deletions.
Lastly,
I presume you can find the documents you think you're adding in the index?
Hope that helps
-Doug
On Tue, Mar 10, 2015 at 9:46 PM, Xudong You <xudon...@gmail.com
<javascript:>> wrote:
Thanks!
I tried the explain and better understand how the score comes. But still
has question on the IDF score, the IDF in the explain output of my query is:
{
"value": 0.30685282,
"description": "idf(docFreq=1, maxDocs=1)"
}
What does docFreq and maxDocs in above mean? Per the IDF definition, the
score should be affected by the total number of documents in the index, but
seems the value is always 0.30685282 no matter how many docs I inserted to
the index.
On Tuesday, March 10, 2015 at 5:39:56 PM UTC+8, Nhật Quang Phan wrote:
You can enable explain for your query and see how elasticsearch
calculates score:
On Tuesday, March 10, 2015 at 3:15:50 PM UTC+7, Xudong You wrote:
I have two documents as follows:
{
"title":"xbox"
}
{
"title":"xbox xbox xbox"
}
Then I search the documents with following query:
{
"query":{"match":{"title":"xbox"}}
}
ES returns result as follows:
{"took":133,"timed_out":false,"_shards":{"total":5,"
successful":5,"failed":0},"hits":{"total":2,"max_score":0.30685282,
"hits":[
{"_index":"storetest1","_type":"type","_id":"1","_score":0.
30685282,"_source":{"title":"xbox","keywords":["xbox"]}},
{"_index":"storetest1","_type":"type","_id":"2","_score":0.
26574233,"_source":{"title":"xbox xbox xbox","keywords":["xbox"]}}]}}
My question is, why #1 got higher score than #2? I thought #2 is higher
than #1, since more xbox appear in title of #1.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.