Wrong highlight when using subword preserver original


(Cristiana Tiugan) #1

Hello!
I have an analyzer with extended_whitespace tokenizer and word_delimiter filter.
Something like:

"analysis": {
      "tokenizer": {
        "extended_whitespace": {
          "type": "pattern",
          "pattern": "\\s+"
        }
      },
    "filter": {
          "subword": {
          "type": "word_delimiter",
          "preserve_original": true,
          "catenate_numbers": true
        }
    },
    "analyzer": {
      "search_fulltext_analyzer": {
          "tokenizer": "extended_whitespace",
          "filter": [
            "subword",
            "lowercase",
            "filter_ascii_folding"
          ],
          "char_filter": [
            "url_encode_pattern"
          ]
        },
    }

My problem is that if i have a text like: "NODE_PATH=/usr/lib/node_modules ./upgrade-indices.js upgrades"
The highlight will be wrong:

 "{highlight}NODE_PATH=/usr/lib/node{/highlight}_modules ./upgrade-indices.js upgrades"

Instead of

"{highlight}NODE{/highlight}_PATH=/usr/lib/{highlight}node{/highlight}_modules ./upgrade-indices.js upgrades"

I know it has something to do with the offsets. But I can't find a solution.
This is the analyze api result:

{
  "tokens": [
    {
      "token": "node_path=/usr/lib/node_modules",
      "start_offset": 0,
      "end_offset": 31,
      "type": "word",
      "position": 0
    },
    {
      "token": "node",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0
    },
    {
      "token": "path",
      "start_offset": 5,
      "end_offset": 9,
      "type": "word",
      "position": 1
    },
    {
      "token": "usr",
      "start_offset": 11,
      "end_offset": 14,
      "type": "word",
      "position": 2
    },
    {
      "token": "lib",
      "start_offset": 15,
      "end_offset": 18,
      "type": "word",
      "position": 3
    },
    {
      "token": "node",
      "start_offset": 19,
      "end_offset": 23,
      "type": "word",
      "position": 4
    },
    {
      "token": "modules",
      "start_offset": 24,
      "end_offset": 31,
      "type": "word",
      "position": 5
    },
    {
      "token": "./upgrade-indices.js",
      "start_offset": 32,
      "end_offset": 52,
      "type": "word",
      "position": 6
    },
    {
      "token": "upgrade",
      "start_offset": 34,
      "end_offset": 41,
      "type": "word",
      "position": 6
    },
    {
      "token": "indices",
      "start_offset": 42,
      "end_offset": 49,
      "type": "word",
      "position": 7
    },
    {
      "token": "js",
      "start_offset": 50,
      "end_offset": 52,
      "type": "word",
      "position": 8
    },
    {
      "token": "upgrades",
      "start_offset": 53,
      "end_offset": 61,
      "type": "word",
      "position": 9
    }
  ]
}

(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.