假设我又这样一个索引:
PUT myindex
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer",
"filter": [
"lowercase",
"my_stemmer"
]
}
},
"tokenizer": {
"my_tokenizer": {
"type": "pattern",
"pattern": "[;]+"
}
},
"filter": {
"my_stemmer": {
"type": "stemmer",
"name": "english"
}
}
}
}
}
测试分析器:
GET /myindex/_analyze
{
"analyzer": "my_analyzer",
"text": "running dates;Sex health education;Perceptions towards Sexual Health Education"
}
需要将running dates;Sex health education;Perceptions towards Sexual Health Education
按分号分词,然后在对其进行词形还原,预期结果应该是:
{
"tokens": [
{
"token": "run date",
"start_offset": 0,
"end_offset": 13,
"type": "word",
"position": 0
},
{
"token": "sex health educ",
"start_offset": 14,
"end_offset": 34,
"type": "word",
"position": 1
},
{
"token": "percept toward sexual health educ",
"start_offset": 35,
"end_offset": 78,
"type": "word",
"position": 2
}
]
}
然而实际结果却是这样:
{
"tokens": [
{
"token": "running d",
"start_offset": 0,
"end_offset": 13,
"type": "word",
"position": 0
},
{
"token": "sex health educ",
"start_offset": 14,
"end_offset": 34,
"type": "word",
"position": 1
},
{
"token": "perceptions towards sexual health educ",
"start_offset": 35,
"end_offset": 78,
"type": "word",
"position": 2
}
]
}
该如何实现我的需求?