Explicit synonym mapping fails to map any given LHS entry to RHS entry

Greetings community,

I'm hoping to get some feedback on synonym rule formatting. I'll do my best
to explain using a pseudo example; please bear with me.

  1. I have a specific use case where five documents contain the word
    J555, and one document contains J5-55.
    • both mean the same thing, but are indexed from two different
      sources over which I have no control.
    1. Users search for these documents using J555, J5-55, J-5-55,
      and J-555

How do I create a mapping that will allow each of the cases listed in #2
result in, at the very least, the six documents referred to in #1? I
thought this would have been the following:

J555, J5-55, J-5-55, J-555 => J555, J5-55

But that doesn't work as expected. We have expand=true in our synonym
configuration. Do you have any thoughts? My main goal is simply to
understand better how the mappings work.

Sincerely,
Tyler

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8d8f585b-3ad2-4041-9ba3-0268999c9b24%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Synonyms are working on tokens. So, after breaking the text into tokens.
The '-' is normally a separator, so J5-55 gets splitted into J5 and 55.
So, your synonym filter gets J5 and 55, and there is no rule for that.
Could that be your problem?

If so, you can use a different tokenizer that doesn't split the '-', or use
a charfilter that maps it to an '_'.
Another approach would be to use shingles,
If you want the dash to be a separator as well, take a look at the word
delimiter filter.

/Peter

Op woensdag 25 februari 2015 03:10:28 UTC+1 schreef Tyler H:

Greetings community,

I'm hoping to get some feedback on synonym rule formatting. I'll do my
best to explain using a pseudo example; please bear with me.

  1. I have a specific use case where five documents contain the word
    J555, and one document contains J5-55.
    • both mean the same thing, but are indexed from two different
      sources over which I have no control.
    1. Users search for these documents using J555, J5-55, J-5-55,
      and J-555

How do I create a mapping that will allow each of the cases listed in #2
result in, at the very least, the six documents referred to in #1? I
thought this would have been the following:

J555, J5-55, J-5-55, J-555 => J555, J5-55

But that doesn't work as expected. We have expand=true in our synonym
configuration. Do you have any thoughts? My main goal is simply to
understand better how the mappings work.

Sincerely,
Tyler

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d1e51513-e872-4962-9602-cd989ee4b0ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.