Multi-value field scoring

Hey,

I'm trying to find a way in ES or in Lucene to change the scoring of a multi-value field.
Today it looks like that the multi-value field is being treated as bag of words.
I want to be able to calculate the score per each value in the multi-value and use the max scored value in the document score at the end.

For example:

  1. id: 1, Title: "new Sony Playstation", "ps 3", "game console for TV"
  2. id: 2, Title: "Sony Playstation", "console game 3", "brand new Playstation 3"
  3. id: 3, Title: "Brand sony console", "Best console sony playstation", "new sony playstation 3"

Input keywords: "sony, "new", "playstation"
The order of the result should be: #1, #2, #3

ES will return the results as follow because #2, #3 has more keywords and #1 has other keywords in the title which aren't relevant to what I'm searching:
id: 2, Title: Sony Playstation, console game 3, brand new Playstation 3
id: 3, Title: Brand sony console, Best console sony playstation, new sony playstation 3
id: 1, Title: new Sony Playstation, ps 3, game console for TV

Is there any specific query or customization I can do in order to change the scoring? The only way I thought about is to break these title into different documents which doesn't look like the best solution but it seemed like there is no other option.

Thanks,
Itay

The problem here is that all three element of the Title array are indexed
as a single Title field. From the search perspective, the first record
is equivalent to

id: 1, Title: "new Sony Playstation ps 3 game console for TV"

The only control that you have is setting position_offset_gap which
specifies distance between instances of the same field. So, by setting it
to 10, for example, you can specify that the word "ps" will be 10 words
apart from the word "Playstation" in the first document and then you can
use the slop_near queries wrapped into custom_filter_score quires to boost
the terms that occur near each other. However, you would have to create
slop_near for every pair, which might be cumbersome and slow.

A better solution would be to index titles as nested documents and use
nested_query with score_mode set to max.

On Monday, December 10, 2012 6:39:06 PM UTC-5, itay yahimovitz wrote:

Hey,

I'm trying to find a way in ES or in Lucene to change the scoring of a
multi-value field.
Today it looks like that the multi-value field is being treated as bag of
words.
I want to be able to calculate the score per each value in the multi-value
and use the max scored value in the document score at the end.

For example:

  1. id: 1, Title: "new Sony Playstation", "ps 3", "game console for TV"
  2. id: 2, Title: "Sony Playstation", "console game 3", "brand new
    Playstation 3"
  3. id: 3, Title: "Brand sony console", "Best console sony playstation",
    "new
    sony playstation 3"

Input keywords: "sony, "new", "playstation"
The order of the result should be: #1, #2, #3

ES will return the results as follow because #2, #3 has more keywords and
#1
has other keywords in the title which aren't relevant to what I'm
searching:
id: 2, Title: Sony Playstation, console game 3, brand new Playstation 3
id: 3, Title: Brand sony console, Best console sony playstation, new sony
playstation 3
id: 1, Title: new Sony Playstation, ps 3, game console for TV

Is there any specific query or customization I can do in order to change
the
scoring? The only way I thought about is to break these title into
different
documents which doesn't look like the best solution but it seemed like
there
is no other option.

Thanks,
Itay

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Multi-value-field-scoring-tp4026784.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--