自定义分词器后，如何才能进行词干提取

DimonHo · August 10, 2017, 8:43am

假设我又这样一个索引：

PUT myindex
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer",
          "filter": [
            "lowercase",
            "my_stemmer"
          ]
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "pattern",
          "pattern": "[;]+"
        }
      },
      "filter": {
        "my_stemmer": {
          "type": "stemmer",
          "name": "english"
        }
      }
    }
  }
}

测试分析器：

GET /myindex/_analyze
{
  "analyzer": "my_analyzer",
  "text": "running dates;Sex health education;Perceptions towards Sexual Health Education"
}

需要将running dates;Sex health education;Perceptions towards Sexual Health Education按分号分词，然后在对其进行词形还原，预期结果应该是：

{
  "tokens": [
    {
      "token": "run date",
      "start_offset": 0,
      "end_offset": 13,
      "type": "word",
      "position": 0
    },
    {
      "token": "sex health educ",
      "start_offset": 14,
      "end_offset": 34,
      "type": "word",
      "position": 1
    },
    {
      "token": "percept toward sexual health educ",
      "start_offset": 35,
      "end_offset": 78,
      "type": "word",
      "position": 2
    }
  ]
}

然而实际结果却是这样：

{
  "tokens": [
    {
      "token": "running d",
      "start_offset": 0,
      "end_offset": 13,
      "type": "word",
      "position": 0
    },
    {
      "token": "sex health educ",
      "start_offset": 14,
      "end_offset": 34,
      "type": "word",
      "position": 1
    },
    {
      "token": "perceptions towards sexual health educ",
      "start_offset": 35,
      "end_offset": 78,
      "type": "word",
      "position": 2
    }
  ]
}

该如何实现我的需求？

system · September 7, 2017, 8:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
分桶之后怎么按数据字段排序中文提问与讨论	2	1634	July 30, 2018
Type为text 的情况下，设置"index"为 "not_analyzed"好像不起作用中文提问与讨论	3	1761	July 26, 2018
关于es聚合查询指标过滤并限制返回结果数量的问题中文提问与讨论 docker , ilm-index-lifecycle-management	8	4518	July 28, 2020
一覧検索と分類ごとの件数取得を効率的に行いたい日本語による質問・議論はこちら	12	3831	January 9, 2019
More_like_this使用分词器后居然查不出来中文提问与讨论	3	1474	August 23, 2017

自定义分词器后，如何才能进行词干提取

Related topics