How can I model the query and/or mapping so that a partial match of a
sub-phrase has an higher score than what a edgengram would return?
For example, If I have four documents:
foo bar blah
foo blah bar
bar foo blah
bar blah foo
If the search string is "bar bl", I would like document 1 and 4 should be
scored higher than document 2 and 3.
If the field is indexed using edgengram, all 4 documents would match (which
is fine for my use-case) but I think the scoring cannot yield the result I
am looking for.
There is also a "match_phrase_prefix" but that would match only #4.
How can I model the query and/or mapping so that a partial match of a
sub-phrase has an higher score than what a edgengram would return?
For example, If I have four documents:
foo bar blah
foo blah bar
bar foo blah
bar blah foo
If the search string is "bar bl", I would like document 1 and 4 should be
scored higher than document 2 and 3.
If the field is indexed using edgengram, all 4 documents would match
(which is fine for my use-case) but I think the scoring cannot yield the
result I am looking for.
There is also a "match_phrase_prefix" but that would match only #4.
You could use the edgeNGram filter on top of the shingle[1] filter (with
output_unigrams=false). This would allow you to boost on prefixes and
positions at the same time.
The fact that you are interested in prefix matches makes me wonder whether
you are trying to implement auto-completion: if this is the case, a better
option could be to use the completion suggest[2] (which is way faster than
any index-based solution) and use all suffixes of your text as inputs. For
example, the "foo bar blah" suggestion could be indexed with "input": ["foo
bar blah", "bar blah", "blah"]. If you are not trying to implement
auto-completion, you can safely ignore this comment.
Thank you! I am not sure if my case falls into auto-complete - as I am just
now learning some concepts and looking at what is possible. But having said
that, your suggestion does sound interesting and possibly something that
may be a good fit.
Ark
On Monday, September 16, 2013 1:23:40 PM UTC-5, Adrien Grand wrote:
On Mon, Sep 16, 2013 at 4:08 PM, Ark <ayam1...@gmail.com <javascript:>>wrote:
Hello,
Hi,
How can I model the query and/or mapping so that a partial match of a
sub-phrase has an higher score than what a edgengram would return?
For example, If I have four documents:
foo bar blah
foo blah bar
bar foo blah
bar blah foo
If the search string is "bar bl", I would like document 1 and 4 should be
scored higher than document 2 and 3.
If the field is indexed using edgengram, all 4 documents would match
(which is fine for my use-case) but I think the scoring cannot yield the
result I am looking for.
There is also a "match_phrase_prefix" but that would match only #4.
You could use the edgeNGram filter on top of the shingle[1] filter (with
output_unigrams=false). This would allow you to boost on prefixes and
positions at the same time.
The fact that you are interested in prefix matches makes me wonder whether
you are trying to implement auto-completion: if this is the case, a better
option could be to use the completion suggest[2] (which is way faster than
any index-based solution) and use all suffixes of your text as inputs. For
example, the "foo bar blah" suggestion could be indexed with "input": ["foo
bar blah", "bar blah", "blah"]. If you are not trying to implement
auto-completion, you can safely ignore this comment.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.