Highlight doesn't work if not in first position


(Sylvain Attoumani) #1

I'm working with elasticsearch and my highlight doesn't give me what I expect .My mappings looks like that :

PUT my_index
{
      "settings": {
           "analysis": {
                "analyzer": {
                     "my_analyzer": {
                          "tokenizer": "my_tokenizer"
                     }
                },
                "tokenizer": {
                     "my_tokenizer": {
                          "type": "ngram",
                          "min_gram": 2,
                          "max_gram": 25,
                          "token_chars": [
                               "letter",
                               "digit"
                          ]
                     }
                }
           }
      }
}

I put some product in my index

PUT index/product/1
{
     "name" : "Kit Guirlande Guinguette 50m Transparent",
     "field2": "foo"
}

PUT index/product/2
{
     "name": "Guirlande Guinguette Blanc 20 Bulbes 10M",
      "field2": "foo"
}

The mapping for name and field2 :

"name": {
    "type": "text",
    "fields": {
      "keyword": {
          "type": "keyword",
          "ignore_above": 256
       }
     },
    "analyzer": "my_analyzer"
},
"fields2": {
    "type": "text",
    "fields": {
        "keyword": {
            "type": "keyword",
            "ignore_above": 256
        }
     },
     "analyzer": "my_analyzer"
},

And I'm doing a research :

GET index/product/_search
{
     "query":{
          "multi_match": {
               "query" : "guirlande gui"
               "fields":[
                    'name','field2'
               ]
              "minimum_should_match" : "100%"
          }
     }
     "highlight" : {
          "fields":{
              "name" : {}
          }
     }
}

Response

{
 "hits": {
      "total": 2,
       "hits": [
             {
                   "_index":"index",
                   "_type": "product",
                   "_id": "1",
                   "_source": {
                         "name": "Guirlande Guinguette Blanc 20 Bulbes 10M"
                    },
                   "highlight": {
                         "name": [
                               " <em>Guirlande Gui</em>nguette Blanc 20 Bulbes 10M"
                         ]
                   }
             },

             {
                   "_index": "index",
                   "_type": "product",
                   "_id": "2",
                   "_score": 1.601195,
                   "_source": {
                         "name": "Kit Guirlande Guinguette 30m Blanche"
                    },
                   "highlight": {
                         "name": [
                               " Kit Guirlande Guinguette 30m Blanche"
                         ]
                   }
             }
       ]
 }
}

But for the second hit in highlight I would like to have " Kit <em>Guirlande Gui</em>nguette 30m Blanche"


(David Pilato) #2

Could you provide a full recreation script as described in

It will help to better understand what you are doing.
Please, try to keep the example as simple as possible.


(Sylvain Attoumani) #3

Is it better like that @dadoonet ?


(David Pilato) #4

It's better but I had to modify it as some content in not a valid JSON and the index name is wrong. Also I had to manually apply the mapping. This is requiring efforts on our side to make all that run. Providing a fully running script definitely helps and reduces the time needed to answer. So please, provide next time something like:

DELETE index
PUT index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram",
          "min_gram": 2,
          "max_gram": 25,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "product": {
      "properties": {
        "name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "analyzer": "my_analyzer"
        },
        "fields2": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}
PUT index/product/1
{
     "name" : "Kit Guirlande Guinguette 50m Transparent",
     "field2": "foo"
}

PUT index/product/2
{
     "name": "Guirlande Guinguette Blanc 20 Bulbes 10M",
      "field2": "foo"
}
GET index/product/_search
{
     "query":{
          "multi_match": {
               "query" : "guirlande gui",
               "fields":[
                    "name", "field2"
               ],
              "minimum_should_match" : "100%"
          }
     },
     "highlight" : {
          "fields":{
              "name" : {}
          }
     }
}

Anyway.

Note first that there is a deprecation notice when running in 6.1. Something you need to be aware of.

#! Deprecation: Deprecated big difference between max_gram and min_gram in NGram Tokenizer,expected difference must be less than or equal to: [1]

When I search with your query, it does not match at all:

GET index/product/_search
{
     "query":{
          "multi_match": {
               "query" : "guirlande gui",
               "fields":[
                    "name", "field2"
               ],
              "minimum_should_match" : "100%"
          }
     },
     "highlight" : {
          "fields":{
              "name" : {}
          }
     }
}
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

May I recommend that you provide a fully working example?


(David Pilato) #5

Also have a look at what your analyzer is producing:

Here is what your analyzer is doing when processing your text:

POST index/_analyze
{
  "text": [ "Guirlande Guinguette Blanc 20 Bulbes 10M" ],
  "analyzer": "my_analyzer"
}

That could help may be.

That being said, I doubt highlighting works on ngram based data. If you are searching against a ngram field like name, then you probably need to store explicitly this field in order to be able to highlight it? But I'm unsure and I'd prefer to get @jimczi's thoughts.


(Sylvain Attoumani) #6

Ok @dadoonet thanks for the suggestion. I changed few things but It still doesn't work. I hope this is better for you like that :

DELETE my_index
PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer",
          "filter": ["lowercase","asciifolding"]
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram",
          "min_gram": 2,
          "max_gram": 25,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "product": {
      "properties": {
        "name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            },
            "search": {
              "type": "text",
              "search_analyzer": "standard",
              "analyzer": "my_analyzer"
            }
          },
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}

PUT my_index/product/1
{
 "name" : "Kit Guirlande Guinguette 50m Transparent",
 "field2": "foo"
}

PUT my_index/product/2
{
 "name": "Guirlande Guinguette Blanc 20 Bulbes 10M",
  "field2": "foo"
}

GET my_index/product/_search
{
  "query": {
    "multi_match": {
      "query": "Guirlande Gui",
      "fields": [
        "name",
        "field2"
      ],
      "minimum_should_match": "100%"
    }
  },
  "highlight": {
    "fields": {
      "name.search": {
        "highlight_query": {
          "match": {
            "name.search": {
              "query": "Guirlande Gui"
            }
          }
        }
      }
    }
  }
}

With that I have the following result :

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 12.150877,
    "hits": [
      {
        "_index": "my_index",
        "_type": "product",
        "_id": "2",
        "_score": 12.150877,
        "_source": {
          "name": "Guirlande Guinguette Blanc 20 Bulbes 10M",
          "field2": "foo"
        },
        "highlight": {
          "name.search": [
            "<em>Guirlande</em> Guinguette Blanc 20 Bulbes 10M"
          ]
        }
      },
      {
        "_index": "my_index",
        "_type": "product",
        "_id": "1",
        "_score": 11.431992,
        "_source": {
          "name": "Kit Guirlande Guinguette 50m Transparent",
          "field2": "foo"
        },
        "highlight": {
          "name.search": [
            "Kit Guirlande Guinguette 50m Transparent"
          ]
        }
      }
    ]
  }
}

And what I want is :

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 195.12189,
    "hits": [
      {
        "_index": "my_index",
        "_type": "product",
        "_id": "2",
        "_score": 195.12189,
        "_source": {
          "name": "Guirlande Guinguette Blanc 20 Bulbes 10M",
          "field2": "foo"
        },
        "highlight": {
          "name.search": [
            "<em>Guirlande Gui</em>nguette Blanc 20 Bulbes 10M"
          ]
        }
      },
      {
        "_index": "my_index",
        "_type": "product",
        "_id": "1",
        "_score": 197.8033,
        "_source": {
          "name": "Kit Guirlande Guinguette 50m Transparent",
          "field2": "foo"
        },
        "highlight": {
          "name.search": [
            "Kit <em>Guirlande Gui</em>nguette 50m Transparent"
          ]
        }
      }
    ]
  }
}

And by the way I'm using elasticsearch 5.6.4


(David Pilato) #7

Thank you. With 6.1, here is what I'm getting:

{
  "took": 75,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 12.554357,
    "hits": [
      {
        "_index": "my_index",
        "_type": "product",
        "_id": "2",
        "_score": 12.554357,
        "_source": {
          "name": "Guirlande Guinguette Blanc 20 Bulbes 10M",
          "field2": "foo"
        },
        "highlight": {
          "name.search": [
            "<em>Gui</em><em>rlande</em> <em>Gui</em>nguette Blanc 20 Bulbes 10M"
          ]
        }
      },
      {
        "_index": "my_index",
        "_type": "product",
        "_id": "1",
        "_score": 12.281757,
        "_source": {
          "name": "Kit Guirlande Guinguette 50m Transparent",
          "field2": "foo"
        },
        "highlight": {
          "name.search": [
            "Kit <em>Gui</em><em>rlande</em> <em>Gui</em>nguette 50m Transparent"
          ]
        }
      }
    ]
  }
}

(Sylvain Attoumani) #8

Hmm well. I have really no Idea why I don't have that :thinking: thanks for your time anyway @dadoonet


(David Pilato) #9

May be because the behavior changed in 6.x series? IIRC @jimczi worked on that.


(Sylvain Attoumani) #10

Also I have weirdly high score results I don't know if that's normal


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.