Match phrase prefix in set of documents

Ozymandy · December 12, 2018, 3:28pm

Hi all,
I am using multi match phrase_prefix for matching my documents. And it works fine when I need to extract matched documents by whole index. But it does not work properly for case when I need to query by prefix in set of documents(which is found by document field id). For examples I'm querying by all fields of three documents with prefix "b". All documents have at least one word started with letter "b": "baby", "bed" and "bored". But query hits to only one document with word "baby" I executed my query with explain flag and found out that it happens because of fuzzy search. It loads only small part of tokens which suits my query (started with "b") from whole index. Of course i can increase value of max_expansion but it gives me performance issues. So regarding the above description I have question how to make prefix_phrase search work:

Can I limit the scope of documents where it search for tokens? It should be like subquery in SQL(select from (select from) where str like).
2)May be it could be done by another analyzer? For now I am using whitespace tokenizer with lowercase filter.

xavierfacq · December 13, 2018, 9:41am

Hi ,

Can you print your query here ?

bye,
Xavier

Ozymandy · December 13, 2018, 1:13pm

Sure.Here is query I execute: {"from":0,"size":10000,"query":{"bool":{"filter":[{"multi_match":{"query":"b","fields":["category^1.0","body^1.0","title^1.0"],"type":"phrase_prefix","operator":"OR","slop":0,"prefix_length":0,"max_expansions":50,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"fuzzy_transpositions":true,"boost":1.0}},{"terms":{"id":["1", "2", "3"],"boost":1.0}}],"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"sort":[{"published":{"order":"desc"}}]}

xavierfacq · December 13, 2018, 4:02pm

That what you have in your filters with the terms query on field "id", it sounds good.

Regarding the phrase_prefix query, I'm not very aware so I don't have solution.
Just an idea: phrase_prefix returns document having "phrase" with a "b" at the beginning ? If yes, is the case of your 2 documents not returned ?

e.g.: "My phrase with a baby" ? => ok or not ?

Ozymandy · December 13, 2018, 9:34pm

As I said elasticsearch tries to load ALL tokens from ALL documents which suits prefix "b" according programmed max number (max_expansion parameter). So if I have about 100 thousands tokens started with "b" elasticsearch cannot process them all.

system · January 10, 2019, 9:43pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Wildcard and phrase_prefix in _all field by multi_match query Elasticsearch	1	1736	July 5, 2017
Prefix query with multiple "prefixes"/words Elasticsearch	2	3076	July 6, 2017
How to implement fuzzy search for phrase/phrase_prefix in elasticsearch6.8? Elasticsearch	7	2236	July 12, 2019
Multi_match with phrase_prefix is not working although a token has the prefix in it Elastic Search	6	65	August 22, 2024
Match_phrase_prefix vs query_string Elasticsearch	1	708	December 16, 2016

Match phrase prefix in set of documents

Related topics