I have created an multi_field index with one field analyzed
(with edgengram, min:3, max:15) and the other one not. Then i am doing a
multi match on this and get relevant hits.I am doing this to find exact
matches which seems to work.
So far so good, however how do i separate hits that are "really" relevant
(i.e the words are equal but might be in another order etc) to my search
string and "false-positive" results from the ngram which can have very
different meaning.
An example would be:
query:Crankshaft position sensor hits:Position Sensor, Crankshaft
This is a very good and similar results and the score is equal to max score.
However i can not determinate and draw any conclusion rom oly compare the
score value becuase another example could yield the same score but should
not rank as hight becuase the meaning is different.
query:Motoroil hit:Motorblock
This is "not relevant" but ofcourse originates from the ngram. The hit
score is equal to Max score.
Of course i could increate the min and max on the ngram but it
seems usefull for other cases so not really an option.
Can you simply boost the non analyzed field? If the scores are still too
similar, try using a dis_max query with the non analyzed query getting a
higher boost:
I have created an multi_field index with one field analyzed
(with edgengram, min:3, max:15) and the other one not. Then i am doing a
multi match on this and get relevant hits.I am doing this to find exact
matches which seems to work.
So far so good, however how do i separate hits that are "really" relevant
(i.e the words are equal but might be in another order etc) to my search
string and "false-positive" results from the ngram which can have very
different meaning.
An example would be:
query:Crankshaft position sensor hits:Position Sensor, Crankshaft
This is a very good and similar results and the score is equal to max
score.
However i can not determinate and draw any conclusion rom oly compare the
score value becuase another example could yield the same score but should
not rank as hight becuase the meaning is different.
query:Motoroil hit:Motorblock
This is "not relevant" but ofcourse originates from the ngram. The hit
score is equal to Max score.
Of course i could increate the min and max on the ngram but it
seems usefull for other cases so not really an option.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.