Index using the boosted value


(ylbaba) #1

I got sample data below in my hand:
{
"user" : {
"uid" : "user1",
"city" : "ca",
"gender" : "M",
"favorites" : ["sports", "news", "online_game"]
}
}

The problem is that values of field "favorites" already have their own
weight.
Like: W("sports")=2.0, W("news")=1.3, W("online_game")=0.6
means user1 prefers 3 things and likes sports most.

How can this be implemented in ES when index the user data so that it can
affect the result score? p.s. The favorites values are fixed, a small number
set.

I'm considering taking every fav as a boolean field.

thanks.


(Shay Banon) #2

Just want to understand better, when do you want this weights to come into
play? When the user explicitly searches for "sports", or when he searches
for "anything" and you want to boost things based on any query based on
the favorites they have? In any case, here are some thoughts:

  1. If you want to control it in indexing time, and the favorites control the
    "relevancy" of this doc for all type of queries, then you can use the _boost
    field feature:
    http://www.elasticsearch.com/docs/elasticsearch/mapping/boost_field/.
    Basically, you compute the boosting that you want to give the doc (based on
    the favorites you are going to index), and set it as the _boost field in the
    indexed document. This solution will result in the fasted execution
    possible, though its static on indexing time.

  2. You can try and use custom_score query (
    http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/custom_score_query/),
    pass in the params a key (string, favorite) value (the weight) pair, and
    within the script, simply multiply the sub query score with the weight.
    Note, this might be too slow for you since the script needs to be evaluated
    for each hit that matches the query in order to compute the score, you need
    to test it. Also, I haven't tested it, so I might have a syntax problem :wink: :

"custom_score" : {
"query" : {
....
},
"params" : {
"weights" : {
"sports" : 2,
"news" : 1.3,
"online_game" : 0.6
}
}
"script" : "cscore = score; foreach(fav : doc['favorites'].values()) {
cscore = cscore * weights[fav]; };"
}

Of course, in the above, make sure you set the index to not_analyzed for
the favorites mapping.

-shay.banon

On Wed, Aug 18, 2010 at 9:31 AM, ylbaba ylbaba@gmail.com wrote:

I got sample data below in my hand:
{
"user" : {
"uid" : "user1",
"city" : "ca",
"gender" : "M",
"favorites" : ["sports", "news", "online_game"]
}
}

The problem is that values of field "favorites" already have their own
weight.
Like: W("sports")=2.0, W("news")=1.3, W("online_game")=0.6
means user1 prefers 3 things and likes sports most.

How can this be implemented in ES when index the user data so that it can
affect the result score? p.s. The favorites values are fixed, a small number
set.

I'm considering taking every fav as a boolean field.

thanks.


(ylbaba) #3

Thanks kimchy, this helps me a lot.

Actually the weight comes from user's data. Each user has a different set of
favorites and corresponding weight.
Like:
{
"uid" : "user1",
"favorites" : {
"sports" : 2,
"news" : 1.3,
"online_game" : 0.6
}
}

{
"uid" : "user2",
"favorites" : {
"online_game" : 3,
"tech" : 1
}
}

The boost or custom score seems to must be bound to a field, so I take
every favorite
type as explicit field ?

2010/8/18 Shay Banon shay.banon@elasticsearch.com

Just want to understand better, when do you want this weights to come into
play? When the user explicitly searches for "sports", or when he searches
for "anything" and you want to boost things based on any query based on
the favorites they have? In any case, here are some thoughts:

  1. If you want to control it in indexing time, and the favorites control
    the "relevancy" of this doc for all type of queries, then you can use the
    _boost field feature:
    http://www.elasticsearch.com/docs/elasticsearch/mapping/boost_field/.
    Basically, you compute the boosting that you want to give the doc (based on
    the favorites you are going to index), and set it as the _boost field in the
    indexed document. This solution will result in the fasted execution
    possible, though its static on indexing time.

  2. You can try and use custom_score query (
    http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/custom_score_query/),
    pass in the params a key (string, favorite) value (the weight) pair, and
    within the script, simply multiply the sub query score with the weight.
    Note, this might be too slow for you since the script needs to be evaluated
    for each hit that matches the query in order to compute the score, you need
    to test it. Also, I haven't tested it, so I might have a syntax problem :wink: :

"custom_score" : {
"query" : {
....
},
"params" : {
"weights" : {
"sports" : 2,
"news" : 1.3,
"online_game" : 0.6
}
}
"script" : "cscore = score; foreach(fav : doc['favorites'].values()) {
cscore = cscore * weights[fav]; };"
}

Of course, in the above, make sure you set the index to not_analyzed for
the favorites mapping.

-shay.banon

On Wed, Aug 18, 2010 at 9:31 AM, ylbaba ylbaba@gmail.com wrote:

I got sample data below in my hand:
{
"user" : {
"uid" : "user1",
"city" : "ca",
"gender" : "M",
"favorites" : ["sports", "news", "online_game"]
}
}

The problem is that values of field "favorites" already have their own
weight.
Like: W("sports")=2.0, W("news")=1.3, W("online_game")=0.6
means user1 prefers 3 things and likes sports most.

How can this be implemented in ES when index the user data so that it can
affect the result score? p.s. The favorites values are fixed, a small number
set.

I'm considering taking every fav as a boolean field.

thanks.


(Shay Banon) #4

Both solutions I suggested support this. In the first case, where you use
the boost field, when you index a specific user, you aggregate (in one way
or another) the favorites it has, and create a _boost field to index. In the
second solution, with the custom_score query, the params part is dynamic, so
you can pass different values depending on the user (logged in?) that
queries the data.

-shay.banon

On Wed, Aug 18, 2010 at 3:30 PM, ylbaba ylbaba@gmail.com wrote:

Thanks kimchy, this helps me a lot.

Actually the weight comes from user's data. Each user has a different set
of favorites and corresponding weight.
Like:
{
"uid" : "user1",
"favorites" : {
"sports" : 2,
"news" : 1.3,
"online_game" : 0.6
}
}

{
"uid" : "user2",
"favorites" : {
"online_game" : 3,
"tech" : 1
}
}

The boost or custom score seems to must be bound to a field, so I take
every favorite type as explicit field ?

2010/8/18 Shay Banon shay.banon@elasticsearch.com

Just want to understand better, when do you want this weights to come into

play? When the user explicitly searches for "sports", or when he searches
for "anything" and you want to boost things based on any query based on
the favorites they have? In any case, here are some thoughts:

  1. If you want to control it in indexing time, and the favorites control
    the "relevancy" of this doc for all type of queries, then you can use the
    _boost field feature:
    http://www.elasticsearch.com/docs/elasticsearch/mapping/boost_field/.
    Basically, you compute the boosting that you want to give the doc (based on
    the favorites you are going to index), and set it as the _boost field in the
    indexed document. This solution will result in the fasted execution
    possible, though its static on indexing time.

  2. You can try and use custom_score query (
    http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/custom_score_query/),
    pass in the params a key (string, favorite) value (the weight) pair, and
    within the script, simply multiply the sub query score with the weight.
    Note, this might be too slow for you since the script needs to be evaluated
    for each hit that matches the query in order to compute the score, you need
    to test it. Also, I haven't tested it, so I might have a syntax problem :wink: :

"custom_score" : {
"query" : {
....
},
"params" : {
"weights" : {
"sports" : 2,
"news" : 1.3,
"online_game" : 0.6
}
}
"script" : "cscore = score; foreach(fav : doc['favorites'].values()) {
cscore = cscore * weights[fav]; };"
}

Of course, in the above, make sure you set the index to not_analyzed for
the favorites mapping.

-shay.banon

On Wed, Aug 18, 2010 at 9:31 AM, ylbaba ylbaba@gmail.com wrote:

I got sample data below in my hand:
{
"user" : {
"uid" : "user1",
"city" : "ca",
"gender" : "M",
"favorites" : ["sports", "news", "online_game"]
}
}

The problem is that values of field "favorites" already have their own
weight.
Like: W("sports")=2.0, W("news")=1.3, W("online_game")=0.6
means user1 prefers 3 things and likes sports most.

How can this be implemented in ES when index the user data so that it can
affect the result score? p.s. The favorites values are fixed, a small number
set.

I'm considering taking every fav as a boolean field.

thanks.


(ylbaba) #5

I have the both cases. :frowning:
The result score comes from a calculate of custom_score query and matching
favorite values.

{
"uid" : "user1",
"favorites" : {"online_game" : 0.6, "tech" : 5, "sports" : 2, "news" : 1.3,
}
}

{
"uid" : "user2",
"favorites" : {"online_game" : 3, "tech" : 1}
}
e.g. search:
user.favorites:online_game^1 AND user.favorites:tech^3

result scores expected:
user1: 0.6 * 1 + 5 * 3 = 15.6
user2: 3 * 1 + 1 * 3 = 6

2010/8/18 Shay Banon shay.banon@elasticsearch.com

Both solutions I suggested support this. In the first case, where you use
the boost field, when you index a specific user, you aggregate (in one way
or another) the favorites it has, and create a _boost field to index. In the
second solution, with the custom_score query, the params part is dynamic, so
you can pass different values depending on the user (logged in?) that
queries the data.

-shay.banon

On Wed, Aug 18, 2010 at 3:30 PM, ylbaba ylbaba@gmail.com wrote:

Thanks kimchy, this helps me a lot.

Actually the weight comes from user's data. Each user has a different set
of favorites and corresponding weight.
Like:
{
"uid" : "user1",
"favorites" : {
"sports" : 2,
"news" : 1.3,
"online_game" : 0.6
}
}

{
"uid" : "user2",
"favorites" : {
"online_game" : 3,
"tech" : 1
}
}

The boost or custom score seems to must be bound to a field, so I take
every favorite type as explicit field ?

2010/8/18 Shay Banon shay.banon@elasticsearch.com

Just want to understand better, when do you want this weights to come into

play? When the user explicitly searches for "sports", or when he searches
for "anything" and you want to boost things based on any query based on
the favorites they have? In any case, here are some thoughts:

  1. If you want to control it in indexing time, and the favorites control
    the "relevancy" of this doc for all type of queries, then you can use the
    _boost field feature:
    http://www.elasticsearch.com/docs/elasticsearch/mapping/boost_field/.
    Basically, you compute the boosting that you want to give the doc (based on
    the favorites you are going to index), and set it as the _boost field in the
    indexed document. This solution will result in the fasted execution
    possible, though its static on indexing time.

  2. You can try and use custom_score query (
    http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/custom_score_query/),
    pass in the params a key (string, favorite) value (the weight) pair, and
    within the script, simply multiply the sub query score with the weight.
    Note, this might be too slow for you since the script needs to be evaluated
    for each hit that matches the query in order to compute the score, you need
    to test it. Also, I haven't tested it, so I might have a syntax problem :wink: :

"custom_score" : {
"query" : {
....
},
"params" : {
"weights" : {
"sports" : 2,
"news" : 1.3,
"online_game" : 0.6
}
}
"script" : "cscore = score; foreach(fav : doc['favorites'].values())
{ cscore = cscore * weights[fav]; };"
}

Of course, in the above, make sure you set the index to not_analyzed for
the favorites mapping.

-shay.banon

On Wed, Aug 18, 2010 at 9:31 AM, ylbaba ylbaba@gmail.com wrote:

I got sample data below in my hand:
{
"user" : {
"uid" : "user1",
"city" : "ca",
"gender" : "M",
"favorites" : ["sports", "news", "online_game"]
}
}

The problem is that values of field "favorites" already have their own
weight.
Like: W("sports")=2.0, W("news")=1.3, W("online_game")=0.6
means user1 prefers 3 things and likes sports most.

How can this be implemented in ES when index the user data so that it
can affect the result score? p.s. The favorites values are fixed, a small
number set.

I'm considering taking every fav as a boolean field.

thanks.


(nabble) #6

Hi pal --

I'm tackling a similar issue. have you found out a solution that worked well for you? if so, are you willing to share it with the list?

Thanks!


(system) #7