Best approach for weighting tags on a document

Hi All,

I've been thinking about how to index document tags - for example a user
might add a tag to document to indicate what the document is about.

I'm assuming that the best way to index these tags would be to have an
array of tags.

What I'd like to do though is have some way of weighting the tags - so if
multiple people add the same tag it becomes more relevant than a tag that
just a few people have added.

What's the best way of doing this?

Thanks
Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

1 Like

Do you want to display a Tag cloud?
You can use a Terms facet to get the top10 tags.

Is it what you are looking for?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 16 juin 2013 à 12:22, Chris Greening cmgreening@gmail.com a écrit :

Hi All,

I've been thinking about how to index document tags - for example a user might add a tag to document to indicate what the document is about.

I'm assuming that the best way to index these tags would be to have an array of tags.

What I'd like to do though is have some way of weighting the tags - so if multiple people add the same tag it becomes more relevant than a tag that just a few people have added.

What's the best way of doing this?

Thanks
Chris

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I was thinking more along the lines of something like this

doc1
tags: shoes^10, green^5, mens^2

doc2
tags: shoes^20, green^1, turquoise^6

So the tags would be given a higher weighting based on the number of people
who had tagged the item with the same tag value.

If I did a search for "green shoes" then doc1 would come before doc2 in the
search results.

Cheers
Chris.

On Sunday, June 16, 2013 12:55:38 PM UTC+1, David Pilato wrote:

Do you want to display a Tag cloud?
You can use a Terms facet to get the top10 tags.

Is it what you are looking for?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 16 juin 2013 à 12:22, Chris Greening <cmgre...@gmail.com <javascript:>>
a écrit :

Hi All,

I've been thinking about how to index document tags - for example a user
might add a tag to document to indicate what the document is about.

I'm assuming that the best way to index these tags would be to have an
array of tags.

What I'd like to do though is have some way of weighting the tags - so if
multiple people add the same tag it becomes more relevant than a tag that
just a few people have added.

What's the best way of doing this?

Thanks
Chris

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I think you need a custom_score query for every term in the search string.

The custom_score can then take a script to extract the score from the
document. You can use the mvel-scripting language to do this, but I've
never done that before.

I'd store the values like this:

{
tags: [
{ key:"shoes", value: 20 },
{ key:"green", value: 5 }
]
}

The custom_score for 1 term could look like this:

{
custom_score : {
"script" : ????, // should result in the value that belongs to shoes
"query" : {
"filtered" : {
"filter" : {
term : { "tags.key" : "shoes" }
}
}
}
}
}

This query uses a filter to make use of caching. If you have more than 1
term, you should combine them in a bool query, this will also combine the
scoring for the individual custom_queries.
The mvel script will be a bit verbose probably, you have to foreach over
the tags and check to see if the search term matches the key.

note: mvel scripting can be parameterized which you should always do. But
since you'll have to write a foreach, it will be very slow, be warned.

An optimization could be the following format to store the boosting:

{
tags : ["shoes", "green", "mens"]
tags_score: {
"shoes" : 20,
"green" : 5,
"mens" : 1
}
}

that way you can filter on "tags", and fetch the scores with a map lookup:
http://mvel.codehaus.org/MVEL+2.0+Property+Navigation#MVEL20PropertyNavigation-MapAccess
Again, I haven't done anything with mvel myself, so I'm not sure this
MapAccess is even supported...

Good luck,

Jaap

On Sunday, June 16, 2013 4:38:49 PM UTC+2, Chris Greening wrote:

I was thinking more along the lines of something like this

doc1
tags: shoes^10, green^5, mens^2

doc2
tags: shoes^20, green^1, turquoise^6

So the tags would be given a higher weighting based on the number of
people who had tagged the item with the same tag value.

If I did a search for "green shoes" then doc1 would come before doc2 in
the search results.

Cheers
Chris.

On Sunday, June 16, 2013 12:55:38 PM UTC+1, David Pilato wrote:

Do you want to display a Tag cloud?
You can use a Terms facet to get the top10 tags.

Is it what you are looking for?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 16 juin 2013 à 12:22, Chris Greening cmgre...@gmail.com a écrit :

Hi All,

I've been thinking about how to index document tags - for example a user
might add a tag to document to indicate what the document is about.

I'm assuming that the best way to index these tags would be to have an
array of tags.

What I'd like to do though is have some way of weighting the tags - so if
multiple people add the same tag it becomes more relevant than a tag that
just a few people have added.

What's the best way of doing this?

Thanks
Chris

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.