How to search with correct stemming?

Sorry, I'm really new to elasticsearch...
I'm trying basic functionalities...

I created my first index (to be used for italian language):

PUT index_it
{
  "settings": {
    "analysis": {
      "filter": {
        "italian_elision": {
          "type": "elision",
          "articles": [
            "c",
            "l",
            "all",
            "dall",
            "dell",
            "nell",
            "sull",
            "coll",
            "pell",
            "gl",
            "agl",
            "dagl",
            "degl",
            "negl",
            "sugl",
            "un",
            "m",
            "t",
            "s",
            "v",
            "d"
          ],
          "articles_case": true
        },
        "italian_stop": {
          "type": "stop",
          "stopwords": "_italian_"
        },
        "italian_keywords": {
          "type": "keyword_marker",
          "keywords": [
            "esempio"
          ]
        },
        "italian_stemmer": {
          "type": "stemmer",
          "language": "italian"
        }
      },
      "analyzer": {
        "italian_full": {
          "tokenizer": "standard",
          "filter": [
            "italian_elision",
            "lowercase",
            "italian_stop",
            "italian_keywords",
            "italian_stemmer"
          ]
        }
      }
    }
  }
}

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "index_it"
}

and put one word in index ("torta", Italian for "cake"):

PUT index_it/_doc/1
{
  "title": "torta"
}

{
  "_index": "index_it",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

If I analyze that word, I see it's correclty "stemmed" as "tort":

POST index_it/_analyze
{
  "analyzer": "italian_full",
  "text": "torta"
}

{
  "tokens": [
    {
      "token": "tort",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    }
  ]
}

I'd expect to be able to search for both "torta" and "torte" (same word, plural).
I can find the singular form, but not the plural... :-/

GET index_it/_search
{
  "query": {
    "match": {
      "title": "torta"
    }
  }
}

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "index_it",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "title": "torta"
        }
      }
    ]
  }
}

But not "torte":

GET index_it/_search
{
  "query": {
    "simple_query_string": {
      "fields": [ "title" ],
      "query": "torte"
    }
  }
}

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  }
}

What do I miss? Shoul I specify the analizer in the queryes too?

Hi Marco and welcome to the Elastic community!

You are on the right track, but in order for the title field to be analyzed with the italian_full analyzer (and support stemming in Italian), you need to explicitly specify that in the field's mapping when you create the index:

PUT index_it
{
  "settings": { ... }, // your settings as above
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "italian_full"
      }
    }
  }
}

This configures two things: 1. title will be analyzed with italian_full, rather than with the default analyzer, and 2. queries against title will also be analyzed with italian_full, so you can run searches in Italian.

PUT index_it/_doc/1
{
  "title": "torta"
}

GET index_it/_search
{
  "query": {
    "match": {
      "title": "torte"
    }
  }
}

{
...
    "hits": [
      {
        "_index": "index_it",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "title": "torta"
        }
      }
    ]
}

For more info, please check out this guide on specifying an analyzer.

If you want the Italian analyzer to apply to multiple fields in the index by default (e.g. to all text fields), consider using an index template.

Thank you so much! It works like a charm... :slight_smile: Wonderful!
It's so nice too to enter such a viable and responsive community!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.