Basic Query regarding custom_score


(deepu) #1

Hi,
I am pretty new to Elastic Search. Trying it out and looks
awesome till now. I was trying to understand how to use custom_score
and from the documentation could not fathom how to use it.

"custom_score" : {
"query" : {
....
},
"script" : "score * doc['my_numeric_field'].value"
}

What is inside that "query" string ??? Can someone give me a basic
example using custom_score query dsl ?

Also i would like to know how to solve this problem:

Say you have a forum with lots of threads and each post in a thread
has a bunch of thanks (similar to "like" functionality we see in
facebook). Now when i search on that forum index i want it results
posts. i want to order posts on number of likes it has.

The problem is - "How to index likes ?"

If i keep likes as a field in the post document then i will be
suffering consistency issues when 2 people simultaneously make thank a
post. if i put like as a seperate document then there wont be any
consistency issues but then how can i sort posts in the search results
according to number of thanks they receive ???

Cheers,
Deepu.


(Clinton Gormley) #2

Hiya

  I am pretty new to Elastic Search. Trying it out and looks

awesome till now. I was trying to understand how to use custom_score
and from the documentation could not fathom how to use it.

"custom_score" : {
"query" : {
....
},
"script" : "score * doc['my_numeric_field'].value"
}

What is inside that "query" string ??? Can someone give me a basic
example using custom_score query dsl ?

"query" is whatever query you are using to search for documents, which
could be (eg):

matches all documents

"query": { "match_all": {} }

search all fields for the keywords "foo" and "bar"

"query": { "query_string": { "query": "foo bar" } }

By default, the results would be ordered by relevance/score (where all
documents in the match_all query have the same relevance)

So "custom_score" gives you a way of customising the relevance of each
document dynamically. For instance, you could make recent documents more
relevant than older documents. Or in your case, posts with more "likes"
would be more relevant than posts with fewer "likes".

Note, however, that this is a difficult calculation to get right. For
instance, what happens if you have one post which matches 1 keyword but
has 1000 likes, and another which matches 10 keywords, but has no likes?

Getting the right balance probably requires a lot of experimentation.
And should be revisited once you have more live data.

Also i would like to know how to solve this problem:

Say you have a forum with lots of threads and each post in a thread
has a bunch of thanks (similar to "like" functionality we see in
facebook). Now when i search on that forum index i want it results
posts. i want to order posts on number of likes it has.

The problem is - "How to index likes ?"

If i keep likes as a field in the post document then i will be
suffering consistency issues when 2 people simultaneously make thank a
post. if i put like as a seperate document then there wont be any
consistency issues but then how can i sort posts in the search results
according to number of thanks they receive ???

You can't "join" documents like you can in a relational database. So if
you want to use the number of likes to influence the score, then you
need to store this value in the post document.

So you need to solve the consistency issue, which obviously depends on
how your application is setup.

Are you storing this data in a database as well? If so, then you could
just do a COUNT of all the likes that each post has.

If you're not, well you could still store a "likes" document in ES, and
have it reference the ID of the post document, and do a count using ES.

hth

clint


(deepu) #3

Thanks for the detailed explanation.

one more question.

In "script" : "score * doc['my_numeric_field'].value" - score means
default score calculated by ES using relevancy et al ???

Cheers,
Deepu.

On Aug 12, 5:28 pm, Clinton Gormley clin...@iannounce.co.uk wrote:

Hiya

  I am pretty new to Elastic Search. Trying it out and looks

awesome till now. I was trying to understand how to use custom_score
and from the documentation could not fathom how to use it.

"custom_score" : {
"query" : {
....
},
"script" : "score * doc['my_numeric_field'].value"
}

What is inside that "query" string ??? Can someone give me a basic
example using custom_score query dsl ?

"query" is whatever query you are using to search for documents, which
could be (eg):

matches all documents

"query": { "match_all": {} }

search all fields for the keywords "foo" and "bar"

"query": { "query_string": { "query": "foo bar" } }

By default, the results would be ordered by relevance/score (where all
documents in the match_all query have the same relevance)

So "custom_score" gives you a way of customising the relevance of each
document dynamically. For instance, you could make recent documents more
relevant than older documents. Or in your case, posts with more "likes"
would be more relevant than posts with fewer "likes".

Note, however, that this is a difficult calculation to get right. For
instance, what happens if you have one post which matches 1 keyword but
has 1000 likes, and another which matches 10 keywords, but has no likes?

Getting the right balance probably requires a lot of experimentation.
And should be revisited once you have more live data.

Also i would like to know how to solve this problem:

Say you have a forum with lots of threads and each post in a thread
has a bunch of thanks (similar to "like" functionality we see in
facebook). Now when i search on that forum index i want it results
posts. i want to order posts on number of likes it has.

The problem is - "How to index likes ?"

If i keep likes as a field in the post document then i will be
suffering consistency issues when 2 people simultaneously make thank a
post. if i put like as a seperate document then there wont be any
consistency issues but then how can i sort posts in the search results
according to number of thanks they receive ???

You can't "join" documents like you can in a relational database. So if
you want to use the number of likes to influence the score, then you
need to store this value in the post document.

So you need to solve the consistency issue, which obviously depends on
how your application is setup.

Are you storing this data in a database as well? If so, then you could
just do a COUNT of all the likes that each post has.

If you're not, well you could still store a "likes" document in ES, and
have it reference the ID of the post document, and do a count using ES.

hth

clint


(Clinton Gormley) #4

On Thu, 2010-08-12 at 05:54 -0700, deepu wrote:

Thanks for the detailed explanation.

one more question.

In "script" : "score * doc['my_numeric_field'].value" - score means
default score calculated by ES using relevancy et al ???

Correct, but I think that this has changed to _score in master - not
sure - try it out

clint


(Lukáš Vlček) #5

I think it did not change, see:
http://github.com/elasticsearch/elasticsearch/blob/master/modules/elasticsearch/src/test/java/org/elasticsearch/index/query/xcontent/custom_score1.json

On Thu, Aug 12, 2010 at 3:21 PM, Clinton Gormley clinton@iannounce.co.ukwrote:

On Thu, 2010-08-12 at 05:54 -0700, deepu wrote:

Thanks for the detailed explanation.

one more question.

In "script" : "score * doc['my_numeric_field'].value" - score means
default score calculated by ES using relevancy et al ???

Correct, but I think that this has changed to _score in master - not
sure - try it out

clint


(ajgamer) #6

I think score has been changed to _score, check this discussion out
http://groups.google.com/a/elasticsearch.com/group/users/msg/58b74f9527a06099?pli=1

On Thu, Aug 12, 2010 at 7:09 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

I think it did not change, see:

http://github.com/elasticsearch/elasticsearch/blob/master/modules/elasticsearch/src/test/java/org/elasticsearch/index/query/xcontent/custom_score1.json

On Thu, Aug 12, 2010 at 3:21 PM, Clinton Gormley clinton@iannounce.co.ukwrote:

On Thu, 2010-08-12 at 05:54 -0700, deepu wrote:

Thanks for the detailed explanation.

one more question.

In "script" : "score * doc['my_numeric_field'].value" - score means
default score calculated by ES using relevancy et al ???

Correct, but I think that this has changed to _score in master - not
sure - try it out

clint


(Shay Banon) #7

It changed to _score when sorting by it, but as a variable to the custom
score query script, it is still score... . Mmmm, confusing, I will alias it
to also _score, so you can use both.

-shay.banon

On Thu, Aug 12, 2010 at 5:37 PM, Abbie Joseph abie.joseph14@gmail.comwrote:

I think score has been changed to _score, check this discussion out
http://groups.google.com/a/elasticsearch.com/group/users/msg/58b74f9527a06099?pli=1

On Thu, Aug 12, 2010 at 7:09 PM, Lukáš Vlček lukas.vlcek@gmail.comwrote:

I think it did not change, see:

http://github.com/elasticsearch/elasticsearch/blob/master/modules/elasticsearch/src/test/java/org/elasticsearch/index/query/xcontent/custom_score1.json

On Thu, Aug 12, 2010 at 3:21 PM, Clinton Gormley <clinton@iannounce.co.uk

wrote:

On Thu, 2010-08-12 at 05:54 -0700, deepu wrote:

Thanks for the detailed explanation.

one more question.

In "script" : "score * doc['my_numeric_field'].value" - score means
default score calculated by ES using relevancy et al ???

Correct, but I think that this has changed to _score in master - not
sure - try it out

clint


(Shay Banon) #8

Done: http://github.com/elasticsearch/elasticsearch/issues/issue/316.

On Thu, Aug 12, 2010 at 6:40 PM, Shay Banon shay.banon@elasticsearch.comwrote:

It changed to _score when sorting by it, but as a variable to the custom
score query script, it is still score... . Mmmm, confusing, I will alias it
to also _score, so you can use both.

-shay.banon

On Thu, Aug 12, 2010 at 5:37 PM, Abbie Joseph abie.joseph14@gmail.comwrote:

I think score has been changed to _score, check this discussion out
http://groups.google.com/a/elasticsearch.com/group/users/msg/58b74f9527a06099?pli=1

On Thu, Aug 12, 2010 at 7:09 PM, Lukáš Vlček lukas.vlcek@gmail.comwrote:

I think it did not change, see:

http://github.com/elasticsearch/elasticsearch/blob/master/modules/elasticsearch/src/test/java/org/elasticsearch/index/query/xcontent/custom_score1.json

On Thu, Aug 12, 2010 at 3:21 PM, Clinton Gormley <
clinton@iannounce.co.uk> wrote:

On Thu, 2010-08-12 at 05:54 -0700, deepu wrote:

Thanks for the detailed explanation.

one more question.

In "script" : "score * doc['my_numeric_field'].value" - score means
default score calculated by ES using relevancy et al ???

Correct, but I think that this has changed to _score in master - not
sure - try it out

clint


(system) #9