When using 'nori' and synonym_filter, token postagging is null

In ES 7.6.0

I setting index below

PUT test_index
{
    "settings" : {
      "index" : {
        "analysis" : {
          "filter" : {
            "ko_synonyms" : {
              "type" : "synonym",
              "synonyms" : [
                "홈피=>홈페이지"
              ]
            }
          },
          "analyzer" : {
            "platform_nori_search_analyzer" : {
              "filter" : [
                "lowercase",
                "ko_synonyms"
              ],
              "char_filter" : [
                "html_strip"
              ],
              "type" : "custom",
              "tokenizer" : "discard_nori_tokenizer"
            }
          },
          "tokenizer" : {
            "discard_nori_tokenizer" : {
              "type" : "nori_tokenizer",
              "decompound_mode" : "discard"
            }
          }
        }
      }
    }
}

But when I analyze test, in synonym filter pos is null

GET test_index/_analyze
{
  "analyzer": "platform_nori_search_analyzer",
  "text": "홈피",
  "attributes": [], 
  "explain": true
}

[Result]
{
  "detail" : {
    "custom_analyzer" : true,
    "charfilters" : [
      {
        "name" : "html_strip",
        "filtered_text" : [
          "홈피"
        ]
      }
    ],
    "tokenizer" : {
      "name" : "discard_nori_tokenizer",
      "tokens" : [
        {
          "token" : "홈피",
          "start_offset" : 0,
          "end_offset" : 2,
          "type" : "word",
          "position" : 0,
          "bytes" : "[ed 99 88 ed 94 bc]",
          "leftPOS" : "NNG(General Noun)",
          "morphemes" : null,
          "posType" : "MORPHEME",
          "positionLength" : 1,
          "reading" : null,
          "rightPOS" : "NNG(General Noun)",
          "termFrequency" : 1
        }
      ]
    },
    "tokenfilters" : [
      {
        "name" : "lowercase",
        "tokens" : [
          {
            "token" : "홈피",
            "start_offset" : 0,
            "end_offset" : 2,
            "type" : "word",
            "position" : 0,
            "bytes" : "[ed 99 88 ed 94 bc]",
            "leftPOS" : "NNG(General Noun)",
            "morphemes" : null,
            "posType" : "MORPHEME",
            "positionLength" : 1,
            "reading" : null,
            "rightPOS" : "NNG(General Noun)",
            "termFrequency" : 1
          }
        ]
      },
      {
        "name" : "ko_synonyms",
        "tokens" : [
          {
            "token" : "홈페이지",
            "start_offset" : 0,
            "end_offset" : 2,
            "type" : "SYNONYM",
            "position" : 0,
            "bytes" : "[ed 99 88 ed 8e 98 ec 9d b4 ec a7 80]",
            "leftPOS" : null,
            "morphemes" : null,
            "posType" : null,
            "positionLength" : 1,
            "reading" : null,
            "rightPOS" : null,
            "termFrequency" : 1
          }
        ]
      }
    ]
  }

Do you happen to know what caused it?
Thank you in advance.

We don't copy the attributes for synonyms since they can be different from the original tokens. How are you using the POS attribute ? Is it just for debug purpose since we don't index nor use them outside of the analysis chain ?

I use pos_tagging as an indirect way to check the importance of the token. I am going to make another analyzer for analysis rather than search purpose. Thank you for your answer.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.