Partial match of sub-phrases to be scored higher?

Ark · September 16, 2013, 2:08pm

Hello,

How can I model the query and/or mapping so that a partial match of a
sub-phrase has an higher score than what a edgengram would return?

For example, If I have four documents:

foo bar blah
foo blah bar
bar foo blah
bar blah foo

If the search string is "bar bl", I would like document 1 and 4 should be
scored higher than document 2 and 3.

If the field is indexed using edgengram, all 4 documents would match (which
is fine for my use-case) but I think the scoring cannot yield the result I
am looking for.

There is also a "match_phrase_prefix" but that would match only #4.

Thanks
Ark

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jpountz · September 16, 2013, 6:23pm

On Mon, Sep 16, 2013 at 4:08 PM, Ark ayam12yeh34@gmail.com wrote:

Hello,

Hi,

How can I model the query and/or mapping so that a partial match of a
sub-phrase has an higher score than what a edgengram would return?

For example, If I have four documents:

foo bar blah

foo blah bar

bar foo blah

bar blah foo

If the search string is "bar bl", I would like document 1 and 4 should be
scored higher than document 2 and 3.

If the field is indexed using edgengram, all 4 documents would match
(which is fine for my use-case) but I think the scoring cannot yield the
result I am looking for.

There is also a "match_phrase_prefix" but that would match only #4.

You could use the edgeNGram filter on top of the shingle[1] filter (with
output_unigrams=false). This would allow you to boost on prefixes and
positions at the same time.

The fact that you are interested in prefix matches makes me wonder whether
you are trying to implement auto-completion: if this is the case, a better
option could be to use the completion suggest[2] (which is way faster than
any index-based solution) and use all suffixes of your text as inputs. For
example, the "foo bar blah" suggestion could be indexed with "input": ["foo
bar blah", "bar blah", "blah"]. If you are not trying to implement
auto-completion, you can safely ignore this comment.

[1]

[2]

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ark · September 16, 2013, 9:09pm

Thank you! I am not sure if my case falls into auto-complete - as I am just
now learning some concepts and looking at what is possible. But having said
that, your suggestion does sound interesting and possibly something that
may be a good fit.

Ark

On Monday, September 16, 2013 1:23:40 PM UTC-5, Adrien Grand wrote:

On Mon, Sep 16, 2013 at 4:08 PM, Ark <ayam1...@gmail.com <javascript:>>wrote:

Hello,

Hi,

How can I model the query and/or mapping so that a partial match of a
sub-phrase has an higher score than what a edgengram would return?

For example, If I have four documents:

foo bar blah

foo blah bar

bar foo blah

bar blah foo

If the search string is "bar bl", I would like document 1 and 4 should be
scored higher than document 2 and 3.

If the field is indexed using edgengram, all 4 documents would match
(which is fine for my use-case) but I think the scoring cannot yield the
result I am looking for.

There is also a "match_phrase_prefix" but that would match only #4.

You could use the edgeNGram filter on top of the shingle[1] filter (with
output_unigrams=false). This would allow you to boost on prefixes and
positions at the same time.

The fact that you are interested in prefix matches makes me wonder whether
you are trying to implement auto-completion: if this is the case, a better
option could be to use the completion suggest[2] (which is way faster than
any index-based solution) and use all suffixes of your text as inputs. For
example, the "foo bar blah" suggestion could be indexed with "input": ["foo
bar blah", "bar blah", "blah"]. If you are not trying to implement
auto-completion, you can safely ignore this comment.

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[2]
Elasticsearch Platform — Find real-time answers at scale | Elastic

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
edgeNGram filter prefix scoring precedence Elasticsearch	1	338	July 6, 2017
Text_phrase_prefix scoring and closest match Elasticsearch	3	1013	July 6, 2017
Elasticsearch - how to make shorter phrase more relevant in result Elasticsearch	2	624	September 13, 2019
ES gives very different scores, in match_phrase_prefix, for similar documents even I use DfsQueryThenFetch Elasticsearch	1	417	July 6, 2017
Improving query scoring on partial matches Elasticsearch	1	380	April 26, 2019

Partial match of sub-phrases to be scored higher?

Related topics