Similar result set for singular and plural query

Here's the problem, for search terms 'cats' and 'cat', I am trying to get similar results i.e. 'cats' should search for 'cat' internally. To solve this minimal_english seems like a good choice, and am using that as part of other settings (see below),

      $params['body']['settings'] = [
      'analysis' => [
        'analyzer' => [
          'shingle_analyzer' => [
            'tokenizer' => 'standard',
            'filter' => ['standard', 'lowercase', 'filter_stop', 'filter_shingle']
          ],
          'ngram_analyzer' => [
            'tokenizer' => 'ngram_tokenizer',
            'filter' => ['standard', 'lowercase']
          ],
          'stemmer_analyzer' => [
            'tokenizer' => 'standard',
            'filter' => ['standard', 'lowercase', 'filter_english_stemmer']
          ]
        ],
        'tokenizer' => [
          'ngram_tokenizer' => [
            'type' => 'edge_ngram',
            'min_gram' => 3,
            'max_gram' => 10,
            'token_chars' => ['letter', 'digit']
          ]
        ],
        'filter' => [
          'filter_stop' => [
            'type' => 'stop'
          ],
          'filter_shingle' => [
            'type' => 'shingle',
            'min_shingle_size' => 2,
            'max_shingle_size' => 3,
            'output_unigrams' => true,
            'filler_token' => ''
          ],
          'filter_english_stemmer' => [
            'type' => 'stemmer',
            'name' => 'minimal_english'
          ]
        ]
      ]
    ];

Here's the query being built,

               "query" => [
                "bool" => [
                  "must" => [
                    "multi_match" => [
                      "query" => $queryTerm,
                      'type' => 'most_fields',
                      'fields' => [
                        'animal.name.shinglefield',
                        'animal.name.ngramfield',
                      ],
                      'analyzer' => 'stemmer_analyzer'
                    ]
                  ],
                ]
              ],

I tried 3 different scenarios,

  1. filter_english_stemmer as part of filter in shingle_analyzer - With this in place, a search for 'cats' still returns matches for 'cats' based on ngrams which makes sense since ngrams doesn't have filter_english_stemmer filter.
  2. To overcome previous issue, I placed filter_english_stemmer as part of ngram_analyzer filter. But doing so resulted in no matches for 'cats'. Again, this makes sense since there's no shingle/ngram for either 'cats' or 'ats'.
  3. As an alternative approach (and with settings shared above), I used stemmer_analyzer as part of multi_match query (analyzer => 'stemmer_analyzer') and that query gave me similar results for 'cats' and 'cat'.

Now, my question is even though the last approach works but is it a good idea to use analyzer at query time? Is there a better way out?

Also, correct me if I am wrong but the stemmer_analyzer used in multi_match query works since the query 'cats' is reduced to 'cat' and then field specific analyzers are run? Is that how its working?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.