Is it possible to add yet another score value based on similarity (same words) to differentiate between two _scores?


(Pontus Lundin) #1

Hi!

I have created an multi_field index with one field analyzed
(with edgengram, min:3, max:15) and the other one not. Then i am doing a
multi match on this and get relevant hits.I am doing this to find exact
matches which seems to work.

So far so good, however how do i separate hits that are "really" relevant
(i.e the words are equal but might be in another order etc) to my search
string and "false-positive" results from the ngram which can have very
different meaning.

An example would be:

query:Crankshaft position sensor
hits:Position Sensor, Crankshaft

This is a very good and similar results and the score is equal to max score.

However i can not determinate and draw any conclusion rom oly compare the
score value becuase another example could yield the same score but should
not rank as hight becuase the meaning is different.

query:Motoroil
hit:Motorblock

This is "not relevant" but ofcourse originates from the ngram. The hit
score is equal to Max score.
Of course i could increate the min and max on the ngram but it
seems usefull for other cases so not really an option.

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f821aa7e-f666-430f-b4cb-7ed1796c0722%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #2

Can you simply boost the non analyzed field? If the scores are still too
similar, try using a dis_max query with the non analyzed query getting a
higher boost:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html

--
Ivan

On Wed, Sep 3, 2014 at 7:16 AM, Pontus Lundin lundin.codeitez@gmail.com
wrote:

Hi!

I have created an multi_field index with one field analyzed
(with edgengram, min:3, max:15) and the other one not. Then i am doing a
multi match on this and get relevant hits.I am doing this to find exact
matches which seems to work.

So far so good, however how do i separate hits that are "really" relevant
(i.e the words are equal but might be in another order etc) to my search
string and "false-positive" results from the ngram which can have very
different meaning.

An example would be:

query:Crankshaft position sensor
hits:Position Sensor, Crankshaft

This is a very good and similar results and the score is equal to max
score.

However i can not determinate and draw any conclusion rom oly compare the
score value becuase another example could yield the same score but should
not rank as hight becuase the meaning is different.

query:Motoroil
hit:Motorblock

This is "not relevant" but ofcourse originates from the ngram. The hit
score is equal to Max score.
Of course i could increate the min and max on the ngram but it
seems usefull for other cases so not really an option.

Thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f821aa7e-f666-430f-b4cb-7ed1796c0722%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f821aa7e-f666-430f-b4cb-7ed1796c0722%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCdE_%2B8qMy1fE8z9vPv4M3Kc5ZrunMtDdfGuGVwCJnr3g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3