Issue in highlighter API

Hi,

I am trying use highlighter api in my application, it works fine but if my query string ends '0' then i have some issues.

E.g:
POST /user-x187/_search?from=0&size=10
{
    "_source" : ["policyNumer",
               "userId", "companyId", "dateOfJoining", "phoneNumber"], 
       "sort": [
       {
          "dateOfJoining": {
              "order": "desc"
          }
       }
    ], 
     "query": {
        "multi_match": {
            "query": "2345678910",
            "fields": [
               "policyNumer.raw",
               "policyNumer",
               "userId",
               "companyId",
               "phoneNumber"
            ],
            "minimum_should_match":"100"
        }
    },
    "highlight" : {
        "require_field_match": true,
        "fields" : {
            "userId" : {},
            "policyNumer":{},
            "policyNumer.raw":{},
            "companyId":{},
            "phoneNumber":{}
        }
    }
}

Result:

{
   "took": 6,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 6,
      "max_score": null,
      "hits": [
        {
            "_index": "user-x187",
            "_type": "user",
            "_id": "SEARCH00027",
            "_score": null,
            "_source": {
               "userId": "SEARCH00027",
               "companyId": "x187",
               "phoneNumber" : "2345678910",
               "policyNumer":"23456789123456",
               "dateOfJoining" :"07/27/2014, 12:00:00 AM -0400"
            },
            "highlight": {
               "phoneNumber": [
                  "<em>2345678910</em>"
               ],
                "policyNumer": [
               "<em>234567891</em>23456"
               ]
            },
            "sort": [
               "07/27/2014, 12:00:00 AM -0400"
            ]
         }
         ]
   }
}

My query string is "2345678910" but it highlighted "policyNumber" which is 234567891. Any idea why it showing partial field even i put require_field_match is true.

Thanks
Pranesh

require_field_match doesn't have to do with getting back wrong matches, it is to say that you only want to get back highlighted fields that you actually queries in your query, otherwise only the term itself is highlighted regardless of the field that matches it (it's really not about query matches anymore).

That being said, can you post your mapping please?

Thanks Luca,

My mapping is

{  
   "template":"user-*",
   "settings":{  
      "analysis":{  
         "tokenizer":{  
            "ngram_tokenizer":{  
               "type":"nGram",
               "min_gram":8,
               "max_gram":20
            }
         },
         "analyzer":{  
            "ngram_analyzer":{  
               "type":"custom",
               "tokenizer":"ngram_tokenizer",
               "filter":[  
                  "lowercase"
               ]
            },
            "keyword_lower_analyzer":{  
               "tokenizer":"keyword",
               "filter":"lowercase"
            }
         }
      }
   },
   "mappings":{  
      "user":{  
         "dynamic":"false",
         "_timestamp":{  
            "enabled":"true",
            "store":"true",
            "format":"YYYY-MM-DD HH:mm:ss.SSS",
            "default":"now"
         },
         "properties":{  
            "userId":{  
               "type":"string",
               "analyzer":"keyword_lower_analyzer"
            },
            "phoneNumber":{  
               "type":"string",
               "index":"not_analyzed",
               "doc_values":true
            },
            "policyNumber":{  
               "type":"string",
               "term_vector":"yes",
               "analyzer":"ngram_analyzer",
               "fields":{  
                  "raw":{  
                     "type":"string",
                     "analyzer":"keyword_lower_analyzer"
                  }
               }
            },
            "dateOfJoining":{  
               "type":"string",
               "index":"not_analyzed",
               "doc_values":true
            },
            "companyId":{  
               "type":"string",
               "index":"not_analyzed",
               "doc_values":true
            }
         }
      }
   }
}

This is caused by analyzing using ngrams. If you have ngrams at index time, you should make sure that those are not applied at search time by using a different search_analyzer in your mapping. At the moment an ngram of the query matches an ngram of the field content, while you want to get a match only if the whole query matches an ngram or the whole field content.

Cheers
Luca

Got it!!! Thanks Luca.