How to score by IoU (Intersection over Union)


Suppose, I have the following documents:

put movies/movie/1
"title": "Some Romance Movie",
"description": ["romance", "comedy", "Sophie Starring", "James Actor", "90's", "New York"]

put movies/movie/2
"title": "Some Action Movie",
"description": ["action", "science-fiction", "comedy", "James Actor", "2050's", "Future City"]

  1. How do I query, if I want to get a movie ranking based on some input tags, where the score should be based on the amount of matching tags in the description field. Matches with the same amount of matching tags should be ranked by their precision. So if I have 3 query tags, I want the document with those exact 3 tags to be ranked higher than a document with those 3 tags plus some other tags.
    Is this somehow possible? Or do I have to change my data structure for that?

  2. Let's make it more complicated by having (normalized) weights on the tags:

    put movies/movie/1
    "weighted_description": [
    {"tag": "romance", "weight": 0.4},
    {"tag": "comedy", "weight": 0.1},
    {"tag": "Sophie Starring", "weight": 0.2},
    {"tag": "James Actor", "weight": 0.1},
    {"tag": "90's", "weight": 0.05},
    {"tag": "New York", "weight": 0.05}

The scoring should now sum up the minimum of a matching query tag's weight and the document's tag weight. A perfect match (same tags with same weights) would have a score of 1, a query with a less important tag omitted or a non-matching tag with low weight would still rank pretty good, whereas a document with 3 low-weight (or different weight) tags would rank lower.
Can I do that with elasticsearch? If not: can I build a custom analyzer returning that kind of score?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.