Multiple synonyms contribute to the score

I have a synonym file that includes the row:

sunitinib,su11248,su011248,sutent

When I search for 'sutent' and look at the _explain results, I see that
each hit contributes to the score, even though the source document only has
one mention of 'sutent' and none of its synonyms. The net result is that
words with more synonyms artificially get a boost in the results.

    • 0.8291347 = weight(brief_title:sunitinib in 27490), product of:
    • 0.8291347 = weight(brief_title:sunitinib in 27490), product of:
    • 0.5862867 = weight(brief_title:su11248 in 27490), product of:
    • 0.5862867 = weight(brief_title:su011248 in 27490), product of:
    • 0.5862867 = weight(brief_title:sutent in 27490), product of:

Trying expand=true and expand=false in the mapping makes no difference. Is
there a setting I can change to avoid this behaviour?

I'll put together a gist if the solution is not immediately obvious to
someone.

Thanks in advance,

Kevin

BONUS QUESTION: is there an explanation somewhere of when I should
expand=false? I read the explanation in the doc but I'm still not getting
it.

--

Hi Kevin

When I search for 'sutent' and look at the _explain results, I see
that each hit contributes to the score, even though the source
document only has one mention of 'sutent' and none of its synonyms.
The net result is that words with more synonyms artificially get a
boost in the results.

There are various ways to approach this problem. Either you:

  * expand your synonym list at index time (ie you store all
    variations of the synonym in your index), but then you search on
    just one variation (by using a different analyzer at search or
    index time),
  * contract your synonym list at index and search time: eg foo, bar
    or baz all get indexed as just 'foo'.  A search for 'bar'
    becomes a search for 'foo'

I have put together a gist demonstrating how this all works:

The question remains: which should I prefer? expand: true or false?

I'm open to disagreement, but my vote would be for expand: false. ie
index just the first word in the synonym list, not all the words.

My reason for that is:

  1. fewer terms to index

  2. replacing synonyms with all variations or just one variation
    implies the same loss of original information (ie which synonym
    appeared in the original text).

  3. Synonyms can be of different lengths (eg "wi fi" vs "wifi"), which
    means that (with expand: true), the phrase "wifi router" would be
    indexed as:

    Pos: 1 2 3
    wifi router
    wi fi router

    which can mess up eg phrase queries which depends on token positions,
    and can also mess up snippet highlighting.

hth

clint

--

Thank you Clint, that helps my understanding a lot. I will try expand=false.

And thanks for the gist, too.

Kevin

On Saturday, November 17, 2012 4:14:50 AM UTC-8, Clinton Gormley wrote:

There are various ways to approach this problem. Either you:

  * expand your synonym list at index time (ie you store all 
    variations of the synonym in your index), but then you search on 
    just one variation (by using a different analyzer at search or 
    index time), 
  * contract your synonym list at index and search time: eg foo, bar 
    or baz all get indexed as just 'foo'.  A search for 'bar' 
    becomes a search for 'foo' 

I have put together a gist demonstrating how this all works:
https://gist.github.com/4095280

--

A quick note to confirm that this worked for me and to thank you for an
amazingly detailed answer. I learned a ton of stuff from reading your gist.

Thank you, Clint!

Kevin

On Saturday, November 17, 2012 4:14:50 AM UTC-8, Clinton Gormley wrote:

  * contract your synonym list at index and search time: eg foo, bar 
    or baz all get indexed as just 'foo'.  A search for 'bar' 
    becomes a search for 'foo' 

I have put together a gist demonstrating how this all works:
https://gist.github.com/4095280

--

On Mon, 2012-11-19 at 16:54 -0800, Kevin Lawrence wrote:

A quick note to confirm that this worked for me and to thank you for
an amazingly detailed answer. I learned a ton of stuff from reading
your gist.

glad it helped :slight_smile:

Thank you, Clint!

Kevin

    On Saturday, November 17, 2012 4:14:50 AM UTC-8, Clinton
    Gormley wrote:
                  * contract your synonym list at index and search
            time: eg foo, bar 
                    or baz all get indexed as just 'foo'.  A
            search for 'bar' 
                    becomes a search for 'foo' 
            
            I have put together a gist demonstrating how this all
            works: 
            https://gist.github.com/4095280 

--

--