Hello Elasticsearch experts,
I have a question about writing a custom analyzer using a dictionary_decompounder.
Here is a simplified example of the current mappings and settings of my index.
PUT my_index
{
"mappings": {
"properties": {
"content": {
"type": "text"
}
}
},
"settings": {
"analysis": {
"filter": {
"decomp_de":{
"type": "dictionary_decompounder",
"word_list":["dreh", "moment", "schlüssel"]
}
},
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": ["decomp_de"]
}
}
}
}
}
When using the custom analyzer, I see that the decompounding is working as expected:
GET /my_index/_analyze
{
"text": ["drehmomentschlüssel"],
"analyzer": "my_analyzer"{
"content": "schlüssel additional text“
}
}
As a result I get the tokens (drehmomentschlüssel, dreh, moment, schlüssel).
Next I index following two documents:
Document 1:
{"content":"drehmomentschlüssel additional text"}
Document 2:
{"content": "schlüssel additional text“}
Currently when I am searching for
(1)drehmomentschlüssel we get Document1, Document2
(2)schlüssel we get Document1, Document2
My desired result is the following:
Searching for
(1)drehmomentschlüssel we get Document1
(2)schlüssel we get Document1, Document2
So when indexing the decompounding hast to be done. Otherwise search (2) would not return Document1. So I have to use a different analyzer during search time.
Any ideas?