Why match_bool_prefix with a query with trailing space will generate different results

The index mapping:

"orgName": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                }

Part of my query like this :

{
          "bool": {
            "should": [
              {
                "match_bool_prefix": {
                  "orgName": {
                    "query": "island",
                    "operator": "AND",
                    "prefix_length": 0,
                    "max_expansions": 50,
                    "fuzzy_transpositions": true,
                    "boost": 1.0,
                  }
                }
              }
            ],
            "adjust_pure_negative": true,
            "boost": 1.0
          }
        }

But if I change the query to "island " with a trailing space, it will generate different results, specifically it calculate the score in different ways.

For the query without trailing space, the score is calculated as :

"value" : 1.0
"description": "orgName:island*",
"details": []

But for the query with trailing space, it is calculated as this:

    "value": 10.185921,

                            "description": "weight(orgName:island in 45462) [PerFieldSimilarity], result of:",

                            "details": [

                                {

                                    "value": 10.185921,

                                    "description": "score(freq=2.0), computed as boost * idf * tf from:",

                                    "details": [

                                        {

                                            "value": 2.2,

                                            "description": "boost",

                                            "details": []

                                        },

                                        {

                                            "value": 6.9498363,

                                            "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",

                                            "details": [

                                                {

                                                    "value": 23,

                                                    "description": "n, number of documents containing term",

                                                    "details": []

                                                },

                                                {

                                                    "value": 24509,

                                                    "description": "N, total number of documents with field",

                                                    "details": []

                                                }

                                            ]

                                        },

                                        {

                                            "value": 0.6661975,

                                            "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",

                                            "details": [

                                                {

                                                    "value": 2.0,

                                                    "description": "freq, occurrences of term within document",

                                                    "details": []

                                                },

                                                {

                                                    "value": 1.2,

                                                    "description": "k1, term saturation parameter",

                                                    "details": []

                                                },

                                                {

                                                    "value": 0.75,

                                                    "description": "b, length normalization parameter",

                                                    "details": []

                                                },

                                                {

                                                    "value": 4.0,

                                                    "description": "dl, length of field",

                                                    "details": []

                                                },

                                                {

                                                    "value": 5.127382,

                                                    "description": "avgdl, average length of field",

                                                    "details": []

                                                }

                                            ]

                                        }

                                    ]

                                }

                            ]

The match bool prefix query parses the input and creates a boolean query from each term. The parser sees a space as another term is coming, so it constructs the query differently though the end result is correct.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.