Hi all,
I am using multi match phrase_prefix for matching my documents. And it works fine when I need to extract matched documents by whole index. But it does not work properly for case when I need to query by prefix in set of documents(which is found by document field id). For examples I'm querying by all fields of three documents with prefix "b". All documents have at least one word started with letter "b": "baby", "bed" and "bored". But query hits to only one document with word "baby" I executed my query with explain flag and found out that it happens because of fuzzy search. It loads only small part of tokens which suits my query (started with "b") from whole index. Of course i can increase value of max_expansion but it gives me performance issues. So regarding the above description I have question how to make prefix_phrase search work:
Can I limit the scope of documents where it search for tokens? It should be like subquery in SQL(select from (select from) where str like).
2)May be it could be done by another analyzer? For now I am using whitespace tokenizer with lowercase filter.
Sure.Here is query I execute: {"from":0,"size":10000,"query":{"bool":{"filter":[{"multi_match":{"query":"b","fields":["category^1.0","body^1.0","title^1.0"],"type":"phrase_prefix","operator":"OR","slop":0,"prefix_length":0,"max_expansions":50,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"fuzzy_transpositions":true,"boost":1.0}},{"terms":{"id":["1", "2", "3"],"boost":1.0}}],"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"sort":[{"published":{"order":"desc"}}]}
That what you have in your filters with the terms query on field "id", it sounds good.
Regarding the phrase_prefix query, I'm not very aware so I don't have solution.
Just an idea: phrase_prefix returns document having "phrase" with a "b" at the beginning ? If yes, is the case of your 2 documents not returned ?
As I said elasticsearch tries to load ALL tokens from ALL documents which suits prefix "b" according programmed max number (max_expansion parameter). So if I have about 100 thousands tokens started with "b" elasticsearch cannot process them all.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.