Multiple synonyms contribute to the score

Kevin_Lawrence · November 16, 2012, 7:22pm

I have a synonym file that includes the row:

sunitinib,su11248,su011248,sutent

When I search for 'sutent' and look at the _explain results, I see that
each hit contributes to the score, even though the source document only has
one mention of 'sutent' and none of its synonyms. The net result is that
words with more synonyms artificially get a boost in the results.

- 0.8291347 = weight(brief_title:sunitinib in 27490), product of:
- 0.8291347 = weight(brief_title:sunitinib in 27490), product of:
- 0.5862867 = weight(brief_title:su11248 in 27490), product of:
- 0.5862867 = weight(brief_title:su011248 in 27490), product of:
- 0.5862867 = weight(brief_title:sutent in 27490), product of:

Trying expand=true and expand=false in the mapping makes no difference. Is
there a setting I can change to avoid this behaviour?

I'll put together a gist if the solution is not immediately obvious to
someone.

Thanks in advance,

Kevin

BONUS QUESTION: is there an explanation somewhere of when I should
expand=false? I read the explanation in the doc but I'm still not getting
it.

--

Clinton_Gormley · November 17, 2012, 12:14pm

Hi Kevin

When I search for 'sutent' and look at the _explain results, I see
that each hit contributes to the score, even though the source
document only has one mention of 'sutent' and none of its synonyms.
The net result is that words with more synonyms artificially get a
boost in the results.

There are various ways to approach this problem. Either you:

  * expand your synonym list at index time (ie you store all
    variations of the synonym in your index), but then you search on
    just one variation (by using a different analyzer at search or
    index time),
  * contract your synonym list at index and search time: eg foo, bar
    or baz all get indexed as just 'foo'.  A search for 'bar'
    becomes a search for 'foo'

I have put together a gist demonstrating how this all works:

gist.github.com

https://gist.github.com/clintongormley/4095280

gistfile1.md

We create an index with:

 * two filters: `synonyms_expand` and `synonyms_contract`
 * two analyzers: `synonyms_expand` and `synonyms_contract`
 * three text fields:
   * `text_1` uses the `synonyms_expand` analyzer at index and search time
   * `text_2` uses the `synonyms_expand` analyzer at index time, but the `standard` analyzer at search time
   * `text_3` uses the `synonyms_contract` analyzer at index and search time

.

This file has been truncated. show original

The question remains: which should I prefer? expand: true or false?

I'm open to disagreement, but my vote would be for expand: false. ie
index just the first word in the synonym list, not all the words.

My reason for that is:

fewer terms to index
replacing synonyms with all variations or just one variation
implies the same loss of original information (ie which synonym
appeared in the original text).
Synonyms can be of different lengths (eg "wi fi" vs "wifi"), which
means that (with expand: true), the phrase "wifi router" would be
indexed as:

Pos: 1 2 3
wifi router
wi fi router

which can mess up eg phrase queries which depends on token positions,
and can also mess up snippet highlighting.

hth

clint

--

Kevin_Lawrence · November 19, 2012, 10:07pm

Thank you Clint, that helps my understanding a lot. I will try expand=false.

And thanks for the gist, too.

Kevin

On Saturday, November 17, 2012 4:14:50 AM UTC-8, Clinton Gormley wrote:

There are various ways to approach this problem. Either you:

  * expand your synonym list at index time (ie you store all 
    variations of the synonym in your index), but then you search on 
    just one variation (by using a different analyzer at search or 
    index time), 
  * contract your synonym list at index and search time: eg foo, bar 
    or baz all get indexed as just 'foo'.  A search for 'bar' 
    becomes a search for 'foo'

I have put together a gist demonstrating how this all works:
Using synonyms in Elasticsearch · GitHub

--

Kevin_Lawrence · November 20, 2012, 12:54am

A quick note to confirm that this worked for me and to thank you for an
amazingly detailed answer. I learned a ton of stuff from reading your gist.

Thank you, Clint!

Kevin

On Saturday, November 17, 2012 4:14:50 AM UTC-8, Clinton Gormley wrote:

  * contract your synonym list at index and search time: eg foo, bar 
    or baz all get indexed as just 'foo'.  A search for 'bar' 
    becomes a search for 'foo' 
I have put together a gist demonstrating how this all works:
Using synonyms in Elasticsearch · GitHub

--

Clinton_Gormley · November 20, 2012, 10:23am

On Mon, 2012-11-19 at 16:54 -0800, Kevin Lawrence wrote:

A quick note to confirm that this worked for me and to thank you for
an amazingly detailed answer. I learned a ton of stuff from reading
your gist.

glad it helped

Thank you, Clint!

Kevin

    On Saturday, November 17, 2012 4:14:50 AM UTC-8, Clinton
    Gormley wrote:
                  * contract your synonym list at index and search
            time: eg foo, bar 
                    or baz all get indexed as just 'foo'.  A
            search for 'bar' 
                    becomes a search for 'foo' 
            
            I have put together a gist demonstrating how this all
            works: 
            https://gist.github.com/4095280

--

Topic		Replies	Views
Using the synonyms while using the _search "method" Elasticsearch	7	505	July 6, 2017
Synonym expansion only during search time? Elasticsearch	3	1928	July 6, 2017
Synonym multi words search Elasticsearch	7	584	July 6, 2017
Synonym configuration Elasticsearch	2	432	July 6, 2017
Synonym behavior with highlighting and multiple terms Elasticsearch	4	1914	July 6, 2017

Multiple synonyms contribute to the score

Related topics