Existing document not being found by match_phrase_prefix


(Abid Hussain) #1

Hi all,

I've encountered some unexpected search behaviour. We have an index of documents with fields last_name, first_name.

There is a document with first_name Peter and last_name Westphal. When running the following search:

{
    "query": {
        "bool": {
            "must": [
               {
                   "match_phrase_prefix": {
                      "last_name": "west"
                   }
               },
               {
                   "match_phrase_prefix": {
                      "first_name": "peter"
                   }
               }
            ]
        }
    }
}

nothing is found. When changing the last_name query from west to westp, the customer is found.

We're using elasticsearch 1.4.2. Both fields are of type string no specific language analyzer is being used. Can it be that west is somehow being considered as expletive word?

Using explain: true in query doesn't give any information. Are there any analyzation queries available to unterstand search behaviour?

Regards,

Abid


(Abid Hussain) #2

Encountered similar behaviour with document first_name Raimund and last_name Hartmann.

Running the following search

{
    "query": {
        "bool": {
            "must": [
               {
                   "match_phrase_prefix": {
                      "last_name": "h"
                   }
               },
               {
                   "match_phrase_prefix": {
                      "first_name": "raimund"
                   }
               }
            ]
        }
    }
}

finds nothing. When last_name query term is replaced from h to hart, the document is being found.

Any ideas...?


(Luca Cavanna) #3

Hello,
do you maybe have a lot of other documents with last_name starting with "west" but first_name that doesn't start with "peter"? Have you tried playing around with the max_expansions option. Default is 50, meaning that if Westphal is after the first 50 terms that start with west in the inverted index, you don't get that document back. That might explain why adding the "p" makes it match.


(Abid Hussain) #4

Thanks for clarification, increasing max_expansions did it.

So it seems I'm not able to determine if a search returns nothing because no document matched or
because too many matched.


(Abid Hussain) #5

I now set max_expansions to 100. Still not all matching documents are being found when using last_name : "west".

As far as I unterstand, max_expansions in combination with match_phrase_prefix determines the maximum number of existing distinct fields matching the prefix.

What bothers me is that there are 36 distinct last_name fields beginning with west. So I would expect that even the assumed max_expansions default of 50 must be enough in a match_phrase_prefix query and setting it to 100 must be far enough.


(system) #6