I have 2 fields in my index, one containing the standardly analyzed content, while the other one uses a german
analyzer. However, when I insert documents, they don't seem to be stemmed at all, even though they should be by the language-specific analyzer. I've created a minimal example that doesn't work for me:
Create an index:
PUT test-index
{
"settings": {
"index": {
"mapping": {
"total_fields": {
"limit": 1500
}
}
},
"number_of_shards": 1,
"number_of_replicas": 0,
"refresh_interval": "1s"
},
"mappings": {
"_source": {
"enabled": true
},
"dynamic_templates": [
{
"default_string": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "keyword",
"index": true,
"store": false
}
}
}
],
"properties": {
"standard_content": {
"type": "text",
"index": true,
"analyzer": "standard",
"search_analyzer": "standard"
},
"stemmed_content": {
"type": "text",
"index": true,
"analyzer": "german",
"search_analyzer": "german"
}
}
},
"aliases": {}
}
Add a document:
POST test-index/_doc/test-doc
{
"standard_content": "Das Wort Orangen sollte nach dem Stemming zu Orange werden.",
"stemmed_content": "Das Wort Orangen sollte nach dem Stemming zu Orange werden."
}
Essentially, per test the word "Orangen" should be stemmed to "Orange", "sollte" to "soll", and so on. However, the result when I run a match_all
query is identical as inserted.
I had a similar setup already working for stemming/preprocessing language-specific content, but for some reason, it doesn't work now, and I don't understand why. Hopefully, someone can see where I'm making a mistake. Thanks!