When using 'nori' and synonym_filter, token postagging is null

innze · March 10, 2020, 12:25am

In ES 7.6.0

I setting index below

PUT test_index
{
    "settings" : {
      "index" : {
        "analysis" : {
          "filter" : {
            "ko_synonyms" : {
              "type" : "synonym",
              "synonyms" : [
                "홈피=>홈페이지"
              ]
            }
          },
          "analyzer" : {
            "platform_nori_search_analyzer" : {
              "filter" : [
                "lowercase",
                "ko_synonyms"
              ],
              "char_filter" : [
                "html_strip"
              ],
              "type" : "custom",
              "tokenizer" : "discard_nori_tokenizer"
            }
          },
          "tokenizer" : {
            "discard_nori_tokenizer" : {
              "type" : "nori_tokenizer",
              "decompound_mode" : "discard"
            }
          }
        }
      }
    }
}

But when I analyze test, in synonym filter pos is null

GET test_index/_analyze
{
  "analyzer": "platform_nori_search_analyzer",
  "text": "홈피",
  "attributes": [], 
  "explain": true
}

[Result]
{
  "detail" : {
    "custom_analyzer" : true,
    "charfilters" : [
      {
        "name" : "html_strip",
        "filtered_text" : [
          "홈피"
        ]
      }
    ],
    "tokenizer" : {
      "name" : "discard_nori_tokenizer",
      "tokens" : [
        {
          "token" : "홈피",
          "start_offset" : 0,
          "end_offset" : 2,
          "type" : "word",
          "position" : 0,
          "bytes" : "[ed 99 88 ed 94 bc]",
          "leftPOS" : "NNG(General Noun)",
          "morphemes" : null,
          "posType" : "MORPHEME",
          "positionLength" : 1,
          "reading" : null,
          "rightPOS" : "NNG(General Noun)",
          "termFrequency" : 1
        }
      ]
    },
    "tokenfilters" : [
      {
        "name" : "lowercase",
        "tokens" : [
          {
            "token" : "홈피",
            "start_offset" : 0,
            "end_offset" : 2,
            "type" : "word",
            "position" : 0,
            "bytes" : "[ed 99 88 ed 94 bc]",
            "leftPOS" : "NNG(General Noun)",
            "morphemes" : null,
            "posType" : "MORPHEME",
            "positionLength" : 1,
            "reading" : null,
            "rightPOS" : "NNG(General Noun)",
            "termFrequency" : 1
          }
        ]
      },
      {
        "name" : "ko_synonyms",
        "tokens" : [
          {
            "token" : "홈페이지",
            "start_offset" : 0,
            "end_offset" : 2,
            "type" : "SYNONYM",
            "position" : 0,
            "bytes" : "[ed 99 88 ed 8e 98 ec 9d b4 ec a7 80]",
            "leftPOS" : null,
            "morphemes" : null,
            "posType" : null,
            "positionLength" : 1,
            "reading" : null,
            "rightPOS" : null,
            "termFrequency" : 1
          }
        ]
      }
    ]
  }

Do you happen to know what caused it?
Thank you in advance.

jimczi · March 12, 2020, 12:15pm

We don't copy the attributes for synonyms since they can be different from the original tokens. How are you using the POS attribute ? Is it just for debug purpose since we don't index nor use them outside of the analysis chain ?

innze · March 12, 2020, 10:30pm

I use pos_tagging as an indirect way to check the importance of the token. I am going to make another analyzer for analysis rather than search purpose. Thank you for your answer.

system · April 9, 2020, 10:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to bypass restriction with synonym token filter Elasticsearch	7	1068	October 3, 2019
Duplicated fields in analyze response object when analyzing korean with nori_tokenizer Elasticsearch	1	242	June 27, 2022
Synonym token filter question Elasticsearch	3	401	April 29, 2020
Synonym辞書の有効なケースと無効なケースについて日本語による質問・議論はこちら	2	629	April 8, 2019
When using synonyms, got exception "term: xxx analyzed to a token (xxx) with position increment != 1 (got: 0) Elasticsearch	2	1631	April 14, 2022

When using 'nori' and synonym_filter, token postagging is null

Related topics