Need some help with Custom Score Query


(dark_shadow) #1

Hi,

I'm using a custom score query for fetching my results from elasticsearch.
I used a popularity "po" field for modifying _script parameter but somehow
it's not working. I used explain parameter to see how the score is getting
caluclated but my problem is it doesn't seem to take po into consideration
while calculating score. I have printed the score calculation for two docs.
Both have same scores but po field is different for both. Can anyone tell
me where am I going wrong ?

I'm using the following custom score query in my java code:-
String script = "_score * (doc['po'].empty ? 1 : doc['po'].value ==
0.0 ? 1 : doc['po'].value)";
QueryBuilder queryBuilder = QueryBuilders.customScoreQuery(
QueryBuilders.queryString(query)
.field("text",30)
.field("ad")
.field("st")
.field("cn")
.field("co")

.defaultOperator(Operator.AND)).script(script);

{category=Hotel, text=hotels in ranchi, count=45.0,
_id=525472d7d4a769f431649936, location={lon=85.33333, lat=23.35}, po=8.8}
1195.7068 = custom score, product of:
1195.7068 = script score function: composed of:
239.14136 = sum of:
215.63971 = max of:
215.63971 = sum of:
18.701233 = weight(text:ho in 170585) [PerFieldSimilarity],
result of:
18.701233 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
0.27964544 = queryWeight, product of:
2.7864501 = idf(docFreq=960416, maxDocs=5731988)
0.10035904 = queryNorm
66.8748 = fieldWeight in 170585, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
2.7864501 = idf(docFreq=960416, maxDocs=5731988)
24.0 = fieldNorm(doc=170585)
22.496681 = weight(text:hot in 170585) [PerFieldSimilarity],
result of:
22.496681 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
0.30671278 = queryWeight, product of:
3.056155 = idf(docFreq=733378, maxDocs=5731988)
0.10035904 = queryNorm
73.34772 = fieldWeight in 170585, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
3.056155 = idf(docFreq=733378, maxDocs=5731988)
24.0 = fieldNorm(doc=170585)
22.589373 = weight(text:hote in 170585) [PerFieldSimilarity],
result of:
22.589373 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
0.307344 = queryWeight, product of:
3.0624444 = idf(docFreq=728780, maxDocs=5731988)
0.10035904 = queryNorm
73.498665 = fieldWeight in 170585, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
3.0624444 = idf(docFreq=728780, maxDocs=5731988)
24.0 = fieldNorm(doc=170585)
22.600088 = weight(text:hotel in 170585) [PerFieldSimilarity],
result of:
22.600088 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
0.30741686 = queryWeight, product of:
3.0631707 = idf(docFreq=728251, maxDocs=5731988)
0.10035904 = queryNorm
73.5161 = fieldWeight in 170585, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
3.0631707 = idf(docFreq=728251, maxDocs=5731988)
24.0 = fieldNorm(doc=170585)
129.25233 = weight(text:hotels in 170585) [PerFieldSimilarity],
result of:
129.25233 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
0.73517686 = queryWeight, product of:
7.3254676 = idf(docFreq=10260, maxDocs=5731988)
0.10035904 = queryNorm
175.81122 = fieldWeight in 170585, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
7.3254676 = idf(docFreq=10260, maxDocs=5731988)
24.0 = fieldNorm(doc=170585)
23.50164 = max of:
23.50164 = weight(text:in^30.0 in 170585) [PerFieldSimilarity],
result of:
23.50164 = score(doc=170585,freq=1.0 = termFreq=1.0
), product of:
0.31348857 = queryWeight, product of:
30.0 = boost
3.1236706 = idf(docFreq=685498, maxDocs=5731988)
0.0033453014 = queryNorm
74.968094 = fieldWeight in 170585, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
3.1236706 = idf(docFreq=685498, maxDocs=5731988)
24.0 = fieldNorm(doc=170585)
1.0 = queryBoost

{category=Hotel, text=hotels in kerala, count=5.0,
_id=525472d7d4a769f4316499ca, location={lon=93.95, lat=24.81667},
po=9.228571428571428}
1195.7068 = custom score, product of:
1195.7068 = script score function: composed of:
239.14136 = sum of:
215.63971 = max of:
215.63971 = sum of:
18.701233 = weight(text:ho in 170733) [PerFieldSimilarity],
result of:
18.701233 = score(doc=170733,freq=1.0 = termFreq=1.0
), product of:
0.27964544 = queryWeight, product of:
2.7864501 = idf(docFreq=960416, maxDocs=5731988)
0.10035904 = queryNorm
66.8748 = fieldWeight in 170733, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
2.7864501 = idf(docFreq=960416, maxDocs=5731988)
24.0 = fieldNorm(doc=170733)
22.496681 = weight(text:hot in 170733) [PerFieldSimilarity],
result of:
22.496681 = score(doc=170733,freq=1.0 = termFreq=1.0
), product of:
0.30671278 = queryWeight, product of:
3.056155 = idf(docFreq=733378, maxDocs=5731988)
0.10035904 = queryNorm
73.34772 = fieldWeight in 170733, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
3.056155 = idf(docFreq=733378, maxDocs=5731988)
24.0 = fieldNorm(doc=170733)
22.589373 = weight(text:hote in 170733) [PerFieldSimilarity],
result of:
22.589373 = score(doc=170733,freq=1.0 = termFreq=1.0
), product of:
0.307344 = queryWeight, product of:
3.0624444 = idf(docFreq=728780, maxDocs=5731988)
0.10035904 = queryNorm
73.498665 = fieldWeight in 170733, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
3.0624444 = idf(docFreq=728780, maxDocs=5731988)
24.0 = fieldNorm(doc=170733)
22.600088 = weight(text:hotel in 170733) [PerFieldSimilarity],
result of:
22.600088 = score(doc=170733,freq=1.0 = termFreq=1.0
), product of:
0.30741686 = queryWeight, product of:
3.0631707 = idf(docFreq=728251, maxDocs=5731988)
0.10035904 = queryNorm
73.5161 = fieldWeight in 170733, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
3.0631707 = idf(docFreq=728251, maxDocs=5731988)
24.0 = fieldNorm(doc=170733)
129.25233 = weight(text:hotels in 170733) [PerFieldSimilarity],
result of:
129.25233 = score(doc=170733,freq=1.0 = termFreq=1.0
), product of:
0.73517686 = queryWeight, product of:
7.3254676 = idf(docFreq=10260, maxDocs=5731988)
0.10035904 = queryNorm
175.81122 = fieldWeight in 170733, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
7.3254676 = idf(docFreq=10260, maxDocs=5731988)
24.0 = fieldNorm(doc=170733)
23.50164 = max of:
23.50164 = weight(text:in^30.0 in 170733) [PerFieldSimilarity],
result of:
23.50164 = score(doc=170733,freq=1.0 = termFreq=1.0
), product of:
0.31348857 = queryWeight, product of:
30.0 = boost
3.1236706 = idf(docFreq=685498, maxDocs=5731988)
0.0033453014 = queryNorm
74.968094 = fieldWeight in 170733, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
3.1236706 = idf(docFreq=685498, maxDocs=5731988)
24.0 = fieldNorm(doc=170733)
1.0 = queryBoost

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/22d737da-5951-406a-863a-24eaf95e1382%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #2

Coder,

Your query is probably working properly. The explain does not show your
custom score script logic but it should still be working as expected. You
can verify it easily by changing your script logic. For example:

Try:

String script = "doc['po'].empty ? 1000 : doc['po'].value";

And you'll see that the actual score of the document is the value of your
po field. (or 1000 if it doesn't exist).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0e86cfc2-b19e-4662-86e6-aede06743d9f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(dark_shadow) #3

But why am I getting exactly same score for two documents even though their
po field hase different values. If I'm multiplying my score by this value
the resultant value should be different for two, why am I getting same
scores then ?

Also, here is my mapping looks like:

curl -XPUT 'http://localhost:9200/auto_index/http://localhost:9200/acqindex/'
-d '{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1,
"analysis" : {
"analyzer" : {
"str_search_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase","asciifolding","
suggestion_shingle","edgengram"]
},
"str_index_analyzer" : {
"tokenizer" : "standard",
"filter" :
["lowercase","asciifolding","suggestions_shingle","edgengram"]
}
},
"filter" : {
"suggestions_shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 5
},
"edgengram" : {
"type" : "edgeNGram",
"min_gram" : 2,
"max_gram" : 30,
"side" : "front"
},
"mynGram" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 30
}
}
},
"similarity" : {
"index": {
"type":
"org.elasticsearch.index.similarity.CustomSimilarityProvider"
},
"search": {
"type":
"org.elasticsearch.index.similarity.CustomSimilarityProvider"
}
}
}
}

curl -XPUT 'localhost:9200/auto_index/autocomplete/_mapping' -d '{
"autocomplete":{
"_boost" : {
"name" : "po",
"null_value" : 4.0
},
"properties": {
"ad": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"category": {
"type": "string",
"include_in_all" : false
},
"cn": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"ctype": {
"type": "string",
"search_analyzer" : "keyword",
"index_analyzer" : "keyword",
"omit_norms": "true",
"similarity": "index"
},
"eid": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"po": {
"type": "double",
"boost": 4.0
},
"en":{
"type": "boolean"
},
"_oid":{
"type": "long"
},
"text": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"url": {
"type": "string"
}
}
}
}'

I hope I'm not doing anything wrong in my mapping for po field.

On Wed, Jan 29, 2014 at 9:40 PM, Binh Ly binh@hibalo.com wrote:

Coder,

Your query is probably working properly. The explain does not show your
custom score script logic but it should still be working as expected. You
can verify it easily by changing your script logic. For example:

Try:

String script = "doc['po'].empty ? 1000 : doc['po'].value";

And you'll see that the actual score of the document is the value of your
po field. (or 1000 if it doesn't exist).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0e86cfc2-b19e-4662-86e6-aede06743d9f%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAVTvp79evZr1c5vz8xVOc7yV4hh3bFiZjQtOduGS_s2727reQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(dark_shadow) #4

just an update: I'm using default similarity only. Ignore that
customsimilarity line.

On Wed, Jan 29, 2014 at 9:47 PM, Mukul Gupta mukulnitkkr@gmail.com wrote:

But why am I getting exactly same score for two documents even though
their po field hase different values. If I'm multiplying my score by this
value the resultant value should be different for two, why am I getting
same scores then ?

Also, here is my mapping looks like:

curl -XPUT 'http://localhost:9200/auto_index/http://localhost:9200/acqindex/'
-d '{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1,
"analysis" : {
"analyzer" : {
"str_search_analyzer" : {
"tokenizer" : "standard",
"filter" : ["lowercase","asciifolding","
suggestion_shingle","edgengram"]
},
"str_index_analyzer" : {
"tokenizer" : "standard",
"filter" :
["lowercase","asciifolding","suggestions_shingle","edgengram"]
}
},
"filter" : {
"suggestions_shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 5
},
"edgengram" : {
"type" : "edgeNGram",
"min_gram" : 2,
"max_gram" : 30,
"side" : "front"
},
"mynGram" : {
"type" : "nGram",
"min_gram" : 2,
"max_gram" : 30
}
}
},
"similarity" : {
"index": {
"type":
"org.elasticsearch.index.similarity.CustomSimilarityProvider"
},
"search": {
"type":
"org.elasticsearch.index.similarity.CustomSimilarityProvider"
}
}
}
}

curl -XPUT 'localhost:9200/auto_index/autocomplete/_mapping' -d '{
"autocomplete":{
"_boost" : {
"name" : "po",
"null_value" : 4.0
},
"properties": {
"ad": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"category": {
"type": "string",
"include_in_all" : false
},
"cn": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"ctype": {
"type": "string",
"search_analyzer" : "keyword",
"index_analyzer" : "keyword",
"omit_norms": "true",
"similarity": "index"
},
"eid": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"include_in_all" : false
},
"st": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"co": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"po": {
"type": "double",
"boost": 4.0
},
"en":{
"type": "boolean"
},
"_oid":{
"type": "long"
},
"text": {
"type": "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer",
"omit_norms": "true",
"similarity": "index"
},
"url": {
"type": "string"
}
}
}
}'

I hope I'm not doing anything wrong in my mapping for po field.

On Wed, Jan 29, 2014 at 9:40 PM, Binh Ly binh@hibalo.com wrote:

Coder,

Your query is probably working properly. The explain does not show your
custom score script logic but it should still be working as expected. You
can verify it easily by changing your script logic. For example:

Try:

String script = "doc['po'].empty ? 1000 : doc['po'].value";

And you'll see that the actual score of the document is the value of your
po field. (or 1000 if it doesn't exist).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0e86cfc2-b19e-4662-86e6-aede06743d9f%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAVTvp5u2BpUcqs8HZvBYA3zUu8mLiuvfZ%3DXeySffz-yoOF_XA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #5

Just to clarify, can you post the scores of the 2 documents in question:

  1. script = "_score"

  2. script = "doc['po'].value"

Just curious to see the actual values.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8df89eed-7b61-4227-a601-21a81e46b20a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(dark_shadow) #6

Binh,

I'm not able to access po field. It's giving me error:

Caused by: [Error: could not access: value; in class:
org.elasticsearch.index.fielddata.ScriptDocValues$Empty]
[Near : {... doc['po'].value ....}]

I think that is the reason why my score is always same. Everytime, it is
taking it as empty though all my docs contains this field. I have also
added it in my mapping. Can you please tell me why it is behaving like this
?

On Wed, Jan 29, 2014 at 10:34 PM, Binh Ly binh@hibalo.com wrote:

Just to clarify, can you post the scores of the 2 documents in question:

  1. script = "_score"

  2. script = "doc['po'].value"

Just curious to see the actual values.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8df89eed-7b61-4227-a601-21a81e46b20a%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAVTvp4VyCD8_QHRjyCFOBxSxoVPVHt_%2BcJxOcmOQ%3DhBmSuy%2Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(dark_shadow) #7

Binh,

Can you take a look at this also.

Thanks

On Thu, Jan 30, 2014 at 1:47 PM, Mukul Gupta mukulnitkkr@gmail.com wrote:

Binh,

I'm not able to access po field. It's giving me error:

Caused by: [Error: could not access: value; in class:
org.elasticsearch.index.fielddata.ScriptDocValues$Empty]
[Near : {... doc['po'].value ....}]

I think that is the reason why my score is always same. Everytime, it is
taking it as empty though all my docs contains this field. I have also
added it in my mapping. Can you please tell me why it is behaving like this
?

On Wed, Jan 29, 2014 at 10:34 PM, Binh Ly binh@hibalo.com wrote:

Just to clarify, can you post the scores of the 2 documents in question:

  1. script = "_score"

  2. script = "doc['po'].value"

Just curious to see the actual values.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8df89eed-7b61-4227-a601-21a81e46b20a%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAVTvp51u1Na_-SSnXmSQn7tPZV4V-VMrxDSf6LoLa%3Djjwb9cQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #8