Hello,
I'm using cloud.elastic.co to index metadata about German and French baroque vocal music. I need to treat inflected and uninflected forms as equivalent, so someone can search "schlummer" and find the lovely Bach aria "Schlummert ein."
I expected to have to add some baroque verb forms, but the built-in stemmers are missing even modern forms. What can I do?
Settings:
"analyzer_full_text_de": {
"filter": [
"straighten_apostrophes",
"lowercase",
"stop_de",
"german_normalization",
"stemmer_de",
"synonyms_de"
],
"type": "custom",
"tokenizer": "standard"
},
"stemmer_de": {
"name": "german",
"type": "stemmer"
},
"synonyms_de": {
"type": "synonym_graph",
"synonyms": [
"helfen, hilfen"
]
},
"analyzer_full_text_fr": {
"filter": [
"straighten_apostrophes",
"elision_fr",
"lowercase",
"stop_fr",
"stemmer_fr",
"remove_accents"
],
"type": "custom",
"tokenizer": "standard"
},
"stop_fr": {
"type": "stop",
"stopwords": "_french_"
},
"elision_fr": {
"type": "elision",
"articles": [
"l",
"m",
"t",
"qu",
"n",
"s",
"j",
"d",
"c",
"jusqu",
"quoiqu",
"lorsqu",
"puisqu"
],
"articles_case": "true"
},
"stemmer_fr": {
"name": "french",
"type": "stemmer"
},
"straighten_apostrophes": {
"pattern": "’",
"type": "pattern_replace",
"replacement": "'"
}
curl -X POST "localhost:9200/.../_analyze?pretty" -H 'Content-Type: application/json' -d'
{
"analyzer": "analyzer_full_text_de",
"text": "schlummern schlummert gegrüsst grüssen grussen"
}'
=> schlumm, schlummert, gegrusst, gruss, gruss.
I need schlummern/schlummert => schlumm and gegrüsst => gruss.
curl -X POST "localhost:9200/.../_analyze?pretty" -H 'Content-Type: application/json' -d'
{
"analyzer": "analyzer_full_text_fr",
"text": "mal maux"
}'
=> mal, maux.
I need maux => mal.
The other stemmers for these languages didn't work better. What else can I do?
Thanks!
Ben