Tags with variable boost values


(Adam Zell) #1

I am trying to model a document that has a collection of tags where the tag
boost value depends on the total number of the tags in said document. The
operating assumption is that the more tags a document has, the less boost
each matching tag should have on a query. In this example, let's assume
that a document can have at most 5 tags. For a concrete example, document
1 has two tags: "foo" and "bar". Calculating the boost value for each tag
works as follows in pseudo-Ruby:

  1. Calculate the total number of points for the overall boost in the
    document:

total, val = 0, 5

tags.each { |tag|
total += val; val -= 1
}

For document 1, total = (5 + 4)

  1. Calculate each tag boost given the tag's position and the total number
    of points

val = 5

tags.each { |tag|
tag.boost = val / total; val -= 1
}

"foo" has a boost of (5 / 9), while "bar" has (4 / 9).

Any other document where the number of tags is != 2 will have different
boost values per tag. Because of this, I don't think query-time boosting
is a good fit.

One option I have thought of is to create 5 new strings in the schema:
tag_1 to tag_5. Each string will contain a tag, and its associated boost
value set at index time. Then a query would match against tag_1 to tag_5
instead of the tags collection. Is there a cleaner way to do this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #2