Custom analyzer: keyword_marker

SaskiaVola · June 2, 2015, 2:28pm

I'm using a custom analyzer that protects some keywords from being stemmed.

The functionality works as expected, the keyword is not being stemmed, BUT for some reason a second analyzed version of the keyword is being produced:

Here's the behaviour when using the _analyze endpoint:

GET /myIndex/_analyze?analyzer=germanComp
{
"AIDS"
}

{
"tokens": [
{
"token": "aids",
"start_offset": 7,
"end_offset": 11,
"type": "",
"position": 1
},
{
"token": "aid",
"start_offset": 7,
"end_offset": 10,
"type": "",
"position": 1
}
]
}

The first token is perfect, but I'd like to get rid of the second one.

Any idea how I can achieve it?

Here's my custom analyzer:

"analysis": {
"filter": {
"german_stop": {
"type": "stop",
"stopwords": "german"
},
"itm_keywords": {
"type" : "keyword_marker",
"keywords" : ["aids"]
},
"german_stemmer": {
"type": "stemmer",
"language": "light_german"
},
"unify": {
"type": "unique",
"only_on_same_position": true
}
},
"analyzer": {
"germanComp": {
"tokenizer": "standard",
"filter": [
"lowercase",
"itm_keywords",
"german_stop",
"german_normalization",
"german_stemmer",
"unify"
]
}

Topic		Replies	Views
Elasticsearch Analyzer:Stemmer giving different results Elasticsearch	0	414	January 9, 2019
Generating same token for related words Elasticsearch	0	108	January 26, 2024
Stemmer filter ignored in Analyze API Elasticsearch	2	411	June 28, 2018
Text not stemmed after inserted in the index with language specific analyzer Elasticsearch	0	216	November 8, 2021
Multiple words but same token Elasticsearch	1	409	February 19, 2015

Custom analyzer: keyword_marker

Related topics