Explicit synonym mapping fails to map any given LHS entry to RHS entry

Tyler_H · February 25, 2015, 2:10am

Greetings community,

I'm hoping to get some feedback on synonym rule formatting. I'll do my best
to explain using a pseudo example; please bear with me.

I have a specific use case where five documents contain the word
J555, and one document contains J5-55.
- both mean the same thing, but are indexed from two different
  sources over which I have no control.
1. Users search for these documents using J555, J5-55, J-5-55,
  and J-555

How do I create a mapping that will allow each of the cases listed in #2
result in, at the very least, the six documents referred to in #1? I
thought this would have been the following:

J555, J5-55, J-5-55, J-555 => J555, J5-55

But that doesn't work as expected. We have expand=true in our synonym
configuration. Do you have any thoughts? My main goal is simply to
understand better how the mappings work.

Sincerely,
Tyler

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8d8f585b-3ad2-4041-9ba3-0268999c9b24%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter_van_der_Weerd · February 25, 2015, 5:29am

Synonyms are working on tokens. So, after breaking the text into tokens.
The '-' is normally a separator, so J5-55 gets splitted into J5 and 55.
So, your synonym filter gets J5 and 55, and there is no rule for that.
Could that be your problem?

If so, you can use a different tokenizer that doesn't split the '-', or use
a charfilter that maps it to an '_'.
Another approach would be to use shingles,
If you want the dash to be a separator as well, take a look at the word
delimiter filter.

/Peter

Op woensdag 25 februari 2015 03:10:28 UTC+1 schreef Tyler H:

Greetings community,

I'm hoping to get some feedback on synonym rule formatting. I'll do my
best to explain using a pseudo example; please bear with me.

I have a specific use case where five documents contain the word
J555, and one document contains J5-55.

both mean the same thing, but are indexed from two different
sources over which I have no control.

Users search for these documents using J555, J5-55, J-5-55,
and J-555

How do I create a mapping that will allow each of the cases listed in #2
result in, at the very least, the six documents referred to in #1? I
thought this would have been the following:

J555, J5-55, J-5-55, J-555 => J555, J5-55

But that doesn't work as expected. We have expand=true in our synonym
configuration. Do you have any thoughts? My main goal is simply to
understand better how the mappings work.

Sincerely,
Tyler

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d1e51513-e872-4962-9602-cd989ee4b0ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Question for tokenizer and synonym Elasticsearch	2	427	July 6, 2017
Synonyms: Explicit mapping behaving like Equivalent synonyms Elasticsearch	4	1393	July 16, 2018
Elastic search synonym match involving numeric characters Elasticsearch	4	993	July 6, 2017
Why doesn't this Synonym work? Elasticsearch	13	2991	July 5, 2017
Synonym not working for some entry Elasticsearch	4	1930	July 6, 2017

Explicit synonym mapping fails to map any given LHS entry to RHS entry

Related topics