Match phrase prefix in set of documents

Hi all,
I am using multi match phrase_prefix for matching my documents. And it works fine when I need to extract matched documents by whole index. But it does not work properly for case when I need to query by prefix in set of documents(which is found by document field id). For examples I'm querying by all fields of three documents with prefix "b". All documents have at least one word started with letter "b": "baby", "bed" and "bored". But query hits to only one document with word "baby" I executed my query with explain flag and found out that it happens because of fuzzy search. It loads only small part of tokens which suits my query (started with "b") from whole index. Of course i can increase value of max_expansion but it gives me performance issues. So regarding the above description I have question how to make prefix_phrase search work:

  1. Can I limit the scope of documents where it search for tokens? It should be like subquery in SQL(select from (select from) where str like).
    2)May be it could be done by another analyzer? For now I am using whitespace tokenizer with lowercase filter.

Hi ,

Can you print your query here ?


Sure.Here is query I execute: {"from":0,"size":10000,"query":{"bool":{"filter":[{"multi_match":{"query":"b","fields":["category^1.0","body^1.0","title^1.0"],"type":"phrase_prefix","operator":"OR","slop":0,"prefix_length":0,"max_expansions":50,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"fuzzy_transpositions":true,"boost":1.0}},{"terms":{"id":["1", "2", "3"],"boost":1.0}}],"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"sort":[{"published":{"order":"desc"}}]}

That what you have in your filters with the terms query on field "id", it sounds good.

Regarding the phrase_prefix query, I'm not very aware so I don't have solution.
Just an idea: phrase_prefix returns document having "phrase" with a "b" at the beginning ? If yes, is the case of your 2 documents not returned ?

e.g.: "My phrase with a baby" ? => ok or not ?

As I said elasticsearch tries to load ALL tokens from ALL documents which suits prefix "b" according programmed max number (max_expansion parameter). So if I have about 100 thousands tokens started with "b" elasticsearch cannot process them all.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.