Hello,
I'm using cloud.elastic.co to index metadata about German and French baroque vocal music. I need to treat inflected and uninflected forms as equivalent, so someone can search "schlummer" and find the lovely Bach aria "Schlummert ein."
I expected to have to add some baroque verb forms, but the built-in stemmers are missing even modern forms. What can I do?
Settings:
        "analyzer_full_text_de": {
          "filter": [
            "straighten_apostrophes",
            "lowercase",
            "stop_de",
            "german_normalization",
            "stemmer_de",
            "synonyms_de"
          ],
          "type": "custom",
          "tokenizer": "standard"
        },
        "stemmer_de": {
          "name": "german",
          "type": "stemmer"
        },
        "synonyms_de": {
          "type": "synonym_graph",
          "synonyms": [
            "helfen, hilfen"
          ]
        },
        "analyzer_full_text_fr": {
          "filter": [
            "straighten_apostrophes",
            "elision_fr",
            "lowercase",
            "stop_fr",
            "stemmer_fr",
            "remove_accents"
          ],
          "type": "custom",
          "tokenizer": "standard"
        },
        "stop_fr": {
          "type": "stop",
          "stopwords": "_french_"
        },
        "elision_fr": {
          "type": "elision",
          "articles": [
            "l",
            "m",
            "t",
            "qu",
            "n",
            "s",
            "j",
            "d",
            "c",
            "jusqu",
            "quoiqu",
            "lorsqu",
            "puisqu"
          ],
          "articles_case": "true"
        },
        "stemmer_fr": {
          "name": "french",
          "type": "stemmer"
        },
        "straighten_apostrophes": {
          "pattern": "’",
          "type": "pattern_replace",
          "replacement": "'"
        }
curl -X POST "localhost:9200/.../_analyze?pretty" -H 'Content-Type: application/json' -d'
{
  "analyzer": "analyzer_full_text_de",
  "text":     "schlummern schlummert gegrüsst grüssen grussen"
}'
=> schlumm, schlummert, gegrusst, gruss, gruss.
I need schlummern/schlummert => schlumm and gegrüsst => gruss.
curl -X POST "localhost:9200/.../_analyze?pretty" -H 'Content-Type: application/json' -d'
{
  "analyzer": "analyzer_full_text_fr",
  "text":     "mal maux"
}'
=> mal, maux.
I need maux => mal.
The other stemmers for these languages didn't work better. What else can I do?
Thanks!
Ben