I'm hoping to get some feedback on synonym rule formatting. I'll do my best
to explain using a pseudo example; please bear with me.
I have a specific use case where five documents contain the word J555, and one document contains J5-55.
both mean the same thing, but are indexed from two different
sources over which I have no control.
Users search for these documents using J555, J5-55, J-5-55,
and J-555
How do I create a mapping that will allow each of the cases listed in #2
result in, at the very least, the six documents referred to in #1? I
thought this would have been the following:
J555, J5-55, J-5-55, J-555 => J555, J5-55
But that doesn't work as expected. We have expand=true in our synonym
configuration. Do you have any thoughts? My main goal is simply to
understand better how the mappings work.
Synonyms are working on tokens. So, after breaking the text into tokens.
The '-' is normally a separator, so J5-55 gets splitted into J5 and 55.
So, your synonym filter gets J5 and 55, and there is no rule for that.
Could that be your problem?
If so, you can use a different tokenizer that doesn't split the '-', or use
a charfilter that maps it to an '_'.
Another approach would be to use shingles,
If you want the dash to be a separator as well, take a look at the word
delimiter filter.
/Peter
Op woensdag 25 februari 2015 03:10:28 UTC+1 schreef Tyler H:
Greetings community,
I'm hoping to get some feedback on synonym rule formatting. I'll do my
best to explain using a pseudo example; please bear with me.
I have a specific use case where five documents contain the word J555, and one document contains J5-55.
both mean the same thing, but are indexed from two different
sources over which I have no control.
Users search for these documents using J555, J5-55, J-5-55,
and J-555
How do I create a mapping that will allow each of the cases listed in #2
result in, at the very least, the six documents referred to in #1? I
thought this would have been the following:
J555, J5-55, J-5-55, J-555 => J555, J5-55
But that doesn't work as expected. We have expand=true in our synonym
configuration. Do you have any thoughts? My main goal is simply to
understand better how the mappings work.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.