I am trying to model a document that has a collection of tags where the tag
boost value depends on the total number of the tags in said document. The
operating assumption is that the more tags a document has, the less boost
each matching tag should have on a query. In this example, let's assume
that a document can have at most 5 tags. For a concrete example, document
1 has two tags: "foo" and "bar". Calculating the boost value for each tag
works as follows in pseudo-Ruby:
- Calculate the total number of points for the overall boost in the
document:
total, val = 0, 5
tags.each { |tag|
total += val; val -= 1
}
For document 1, total = (5 + 4)
- Calculate each tag boost given the tag's position and the total number
of points
val = 5
tags.each { |tag|
tag.boost = val / total; val -= 1
}
"foo" has a boost of (5 / 9), while "bar" has (4 / 9).
Any other document where the number of tags is != 2 will have different
boost values per tag. Because of this, I don't think query-time boosting
is a good fit.
One option I have thought of is to create 5 new strings in the schema:
tag_1 to tag_5. Each string will contain a tag, and its associated boost
value set at index time. Then a query would match against tag_1 to tag_5
instead of the tags collection. Is there a cleaner way to do this?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.