Fast vector highlighting boundaries does not work properly


(Iulianlaz) #1

Hello,

I want to highlight the entire URL if a word from URL has a match. For that, I use fast vector highlighting.
When I try to use fast vector highlighting, a problem occurs when I set boundary_chars.

My boundry_chars attribute is set to " " (whitespace), but it seems it does not work. "." character seems to be a boundary too.

Example:

STEP 1: Create a new index: PUT conversation_new

STEP 2: Add the following mapping: POST conversation_new/message/_mapping
{"message":{"properties":{"body":{"type":"string","index":"no","fields":{"contains":{"type":"string","index_analyzer":"index_fulltext_analyzer","search_analyzer":"search_fulltext_analyzer","term_vector":"with_positions_offsets"}}}}}}

STEP 3: Add a document: POST conversation_new/message/
{"body":"Iulian Test First  9.test.9.23.user from lorem_ipsum_test PowerShot  http://192.168.9.236.4grid.eu/person/at/9.23.user/"}

STEP 4: Execute highlight query: POST conversation_new/message/_search
{"query":{"filtered":{"query":[{"bool":{"should":[{"match":{"body.contains":"9.23.user"}}]}}]}},"fields":["_parent","_source"],"size":20,"highlight":{"fields":{"body.contains":{"number_of_fragments":0,"type":"fvh","boundary_chars":" ","boundary_max_scan":100}}}}

The response from STEP 4 will be:

"hits": {
  "total": 1,
  "max_score": 0.07527358,
  "hits": [
     {
        "_index": "conversation_new",
        "_type": "message",
        "_id": "AU4LqXYP_8lKZ4oIJkkZ",
        "_score": 0.07527358,
        "_source": {
           "body": "Iulian Test First  9.test.9.23.user from blabla_test PowerShot  http://192.168.9.236.4grid.eu/person/at/9.23.user/"
        },
        "highlight": {
           "body.contains": [
              "Iulian Test First  <em>9</em>.test.<em>9</em>.<em>23</em>.<em>user</em> from blabla_test PowerShot  http://192.168.<em>9</em>.236.4grid.eu/person/at/<em>9</em>.<em>23</em>.<em>user</em>/"
           ]
        }
     }
  ]

}

I expect something like:

"body.contains": [
              "Iulian Test First  <em>9test.9.23.user</em> from blabla_test PowerShot  <em>http://192.168.9.236.4grid.eu/person/at/9.23.user</em>/"

Actually, I want to highlight the entire URL if a word from URL has a match.

Is this a bug?

Thanks a lot,
Iulian


(system) #2