Highlighting of irrelevant text between matches


#1

When I index the following text:

"_source":{"_analyzer":"english_index","streamId":1,"language":"english","message":"The Doctor to anyone who tries to speak the truth and to reveal how evil he is"}

and I run the following search with highlighting request:

{
  "size": 50,
  "query": {
    "bool": {
      "must": {
        "bool": {
          "must": [
            {
              "bool": {
                "should": [
                  {
                    "match": {
                      "_all": {
                        "query": "doctor",
                        "type": "boolean",
                        "analyzer": "english"
                      }
                    }
                  }
                ]
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "match": {
                      "_all": {
                        "query": "evil",
                        "type": "boolean",
                        "analyzer": "english"
                      }
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
  },
  "post_filter": {
    "term": {
      "_type": "Document"
    }
  },
  "highlight": {
    "pre_tags": [
      "<b>"
    ],
    "post_tags": [
      "</b>"
    ],
    "fragment_size": 0,
    "number_of_fragments": 0,
    "fields": {
      "WordA": {},
      "WordB": {},
      "WordC": {},
      "message": {},
      "user": {}
    }
  }
}

I get the following result:

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 30,
    "successful" : 30,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.7589359,
    "hits" : [ {
      "_index" : "fts-english",
      "_type" : "Document",
      "_id" : "mixed_en_cn",
      "_score" : 1.7589359,
      "_source":{"_analyzer":"english_index","streamId":1,"language":"english","message":"The Doctor to anyone who tries to speak the truth and to reveal how evil he is"},
      "highlight" : {
        "message" : [ "The <b>Doctor to anyone who tries to speak the truth and to reveal how evil</b> he is" ]
      }
    } ]
  }
}

As you can see in the highlight field - the whole text between the two matched words is highlighted, whereas I expected to get only the words "Doctor" and "evil" highlighted, much like the following:

"message" : [ "The <b>Doctor</b> to anyone who tries to speak the truth and to reveal how <b>evil</b> he is" ]

What is wrong here? Is there a way to fix it or is that an ES bug?


(Nik Everett) #2

Sounds like a bug to me. Highlighting is a part of Elasticsearch that really needs some love. At some point in the next few months I suspect I'll start spending a lot of time on it. Until then you might try the experimental-highlighter plugin which I worked on before joining Elastic. It tends to work better than the builtin highlighters.


#3

Thank you Nik!
I will definitely try it.


#4

I tried it and it worked well for the above example, but once I tried phrase search for more than one word in a phrase - it returned no highlighting at all (whereas the same query without the experimental-highlighter did return highlighting).


#5

Works fine on ES 2.1.1.


(system) #6