Highlight doesn't work if not in first position

I'm working with elasticsearch and my highlight doesn't give me what I expect .My mappings looks like that :

PUT my_index
{
      "settings": {
           "analysis": {
                "analyzer": {
                     "my_analyzer": {
                          "tokenizer": "my_tokenizer"
                     }
                },
                "tokenizer": {
                     "my_tokenizer": {
                          "type": "ngram",
                          "min_gram": 2,
                          "max_gram": 25,
                          "token_chars": [
                               "letter",
                               "digit"
                          ]
                     }
                }
           }
      }
}

I put some product in my index

PUT index/product/1
{
     "name" : "Kit Guirlande Guinguette 50m Transparent",
     "field2": "foo"
}

PUT index/product/2
{
     "name": "Guirlande Guinguette Blanc 20 Bulbes 10M",
      "field2": "foo"
}

The mapping for name and field2 :

"name": {
    "type": "text",
    "fields": {
      "keyword": {
          "type": "keyword",
          "ignore_above": 256
       }
     },
    "analyzer": "my_analyzer"
},
"fields2": {
    "type": "text",
    "fields": {
        "keyword": {
            "type": "keyword",
            "ignore_above": 256
        }
     },
     "analyzer": "my_analyzer"
},

And I'm doing a research :

GET index/product/_search
{
     "query":{
          "multi_match": {
               "query" : "guirlande gui"
               "fields":[
                    'name','field2'
               ]
              "minimum_should_match" : "100%"
          }
     }
     "highlight" : {
          "fields":{
              "name" : {}
          }
     }
}

Response

{
 "hits": {
      "total": 2,
       "hits": [
             {
                   "_index":"index",
                   "_type": "product",
                   "_id": "1",
                   "_source": {
                         "name": "Guirlande Guinguette Blanc 20 Bulbes 10M"
                    },
                   "highlight": {
                         "name": [
                               " <em>Guirlande Gui</em>nguette Blanc 20 Bulbes 10M"
                         ]
                   }
             },

             {
                   "_index": "index",
                   "_type": "product",
                   "_id": "2",
                   "_score": 1.601195,
                   "_source": {
                         "name": "Kit Guirlande Guinguette 30m Blanche"
                    },
                   "highlight": {
                         "name": [
                               " Kit Guirlande Guinguette 30m Blanche"
                         ]
                   }
             }
       ]
 }
}

But for the second hit in highlight I would like to have " Kit <em>Guirlande Gui</em>nguette 30m Blanche"

Could you provide a full recreation script as described in

It will help to better understand what you are doing.
Please, try to keep the example as simple as possible.

Is it better like that @dadoonet ?

It's better but I had to modify it as some content in not a valid JSON and the index name is wrong. Also I had to manually apply the mapping. This is requiring efforts on our side to make all that run. Providing a fully running script definitely helps and reduces the time needed to answer. So please, provide next time something like:

DELETE index
PUT index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram",
          "min_gram": 2,
          "max_gram": 25,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "product": {
      "properties": {
        "name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "analyzer": "my_analyzer"
        },
        "fields2": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}
PUT index/product/1
{
     "name" : "Kit Guirlande Guinguette 50m Transparent",
     "field2": "foo"
}

PUT index/product/2
{
     "name": "Guirlande Guinguette Blanc 20 Bulbes 10M",
      "field2": "foo"
}
GET index/product/_search
{
     "query":{
          "multi_match": {
               "query" : "guirlande gui",
               "fields":[
                    "name", "field2"
               ],
              "minimum_should_match" : "100%"
          }
     },
     "highlight" : {
          "fields":{
              "name" : {}
          }
     }
}

Anyway.

Note first that there is a deprecation notice when running in 6.1. Something you need to be aware of.

#! Deprecation: Deprecated big difference between max_gram and min_gram in NGram Tokenizer,expected difference must be less than or equal to: [1]

When I search with your query, it does not match at all:

GET index/product/_search
{
     "query":{
          "multi_match": {
               "query" : "guirlande gui",
               "fields":[
                    "name", "field2"
               ],
              "minimum_should_match" : "100%"
          }
     },
     "highlight" : {
          "fields":{
              "name" : {}
          }
     }
}
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

May I recommend that you provide a fully working example?

Also have a look at what your analyzer is producing:

Here is what your analyzer is doing when processing your text:

POST index/_analyze
{
  "text": [ "Guirlande Guinguette Blanc 20 Bulbes 10M" ],
  "analyzer": "my_analyzer"
}

That could help may be.

That being said, I doubt highlighting works on ngram based data. If you are searching against a ngram field like name, then you probably need to store explicitly this field in order to be able to highlight it? But I'm unsure and I'd prefer to get @jimczi's thoughts.

Ok @dadoonet thanks for the suggestion. I changed few things but It still doesn't work. I hope this is better for you like that :

DELETE my_index
PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer",
          "filter": ["lowercase","asciifolding"]
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram",
          "min_gram": 2,
          "max_gram": 25,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "product": {
      "properties": {
        "name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            },
            "search": {
              "type": "text",
              "search_analyzer": "standard",
              "analyzer": "my_analyzer"
            }
          },
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}

PUT my_index/product/1
{
 "name" : "Kit Guirlande Guinguette 50m Transparent",
 "field2": "foo"
}

PUT my_index/product/2
{
 "name": "Guirlande Guinguette Blanc 20 Bulbes 10M",
  "field2": "foo"
}

GET my_index/product/_search
{
  "query": {
    "multi_match": {
      "query": "Guirlande Gui",
      "fields": [
        "name",
        "field2"
      ],
      "minimum_should_match": "100%"
    }
  },
  "highlight": {
    "fields": {
      "name.search": {
        "highlight_query": {
          "match": {
            "name.search": {
              "query": "Guirlande Gui"
            }
          }
        }
      }
    }
  }
}

With that I have the following result :

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 12.150877,
    "hits": [
      {
        "_index": "my_index",
        "_type": "product",
        "_id": "2",
        "_score": 12.150877,
        "_source": {
          "name": "Guirlande Guinguette Blanc 20 Bulbes 10M",
          "field2": "foo"
        },
        "highlight": {
          "name.search": [
            "<em>Guirlande</em> Guinguette Blanc 20 Bulbes 10M"
          ]
        }
      },
      {
        "_index": "my_index",
        "_type": "product",
        "_id": "1",
        "_score": 11.431992,
        "_source": {
          "name": "Kit Guirlande Guinguette 50m Transparent",
          "field2": "foo"
        },
        "highlight": {
          "name.search": [
            "Kit Guirlande Guinguette 50m Transparent"
          ]
        }
      }
    ]
  }
}

And what I want is :

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 195.12189,
    "hits": [
      {
        "_index": "my_index",
        "_type": "product",
        "_id": "2",
        "_score": 195.12189,
        "_source": {
          "name": "Guirlande Guinguette Blanc 20 Bulbes 10M",
          "field2": "foo"
        },
        "highlight": {
          "name.search": [
            "<em>Guirlande Gui</em>nguette Blanc 20 Bulbes 10M"
          ]
        }
      },
      {
        "_index": "my_index",
        "_type": "product",
        "_id": "1",
        "_score": 197.8033,
        "_source": {
          "name": "Kit Guirlande Guinguette 50m Transparent",
          "field2": "foo"
        },
        "highlight": {
          "name.search": [
            "Kit <em>Guirlande Gui</em>nguette 50m Transparent"
          ]
        }
      }
    ]
  }
}

And by the way I'm using elasticsearch 5.6.4

Thank you. With 6.1, here is what I'm getting:

{
  "took": 75,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 12.554357,
    "hits": [
      {
        "_index": "my_index",
        "_type": "product",
        "_id": "2",
        "_score": 12.554357,
        "_source": {
          "name": "Guirlande Guinguette Blanc 20 Bulbes 10M",
          "field2": "foo"
        },
        "highlight": {
          "name.search": [
            "<em>Gui</em><em>rlande</em> <em>Gui</em>nguette Blanc 20 Bulbes 10M"
          ]
        }
      },
      {
        "_index": "my_index",
        "_type": "product",
        "_id": "1",
        "_score": 12.281757,
        "_source": {
          "name": "Kit Guirlande Guinguette 50m Transparent",
          "field2": "foo"
        },
        "highlight": {
          "name.search": [
            "Kit <em>Gui</em><em>rlande</em> <em>Gui</em>nguette 50m Transparent"
          ]
        }
      }
    ]
  }
}

Hmm well. I have really no Idea why I don't have that :thinking: thanks for your time anyway @dadoonet

May be because the behavior changed in 6.x series? IIRC @jimczi worked on that.

Also I have weirdly high score results I don't know if that's normal

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.