Similar result set for singular and plural query


#1

Here's the problem, for search terms 'cats' and 'cat', I am trying to get similar results i.e. 'cats' should search for 'cat' internally. To solve this minimal_english seems like a good choice, and am using that as part of other settings (see below),

      $params['body']['settings'] = [
      'analysis' => [
        'analyzer' => [
          'shingle_analyzer' => [
            'tokenizer' => 'standard',
            'filter' => ['standard', 'lowercase', 'filter_stop', 'filter_shingle']
          ],
          'ngram_analyzer' => [
            'tokenizer' => 'ngram_tokenizer',
            'filter' => ['standard', 'lowercase']
          ],
          'stemmer_analyzer' => [
            'tokenizer' => 'standard',
            'filter' => ['standard', 'lowercase', 'filter_english_stemmer']
          ]
        ],
        'tokenizer' => [
          'ngram_tokenizer' => [
            'type' => 'edge_ngram',
            'min_gram' => 3,
            'max_gram' => 10,
            'token_chars' => ['letter', 'digit']
          ]
        ],
        'filter' => [
          'filter_stop' => [
            'type' => 'stop'
          ],
          'filter_shingle' => [
            'type' => 'shingle',
            'min_shingle_size' => 2,
            'max_shingle_size' => 3,
            'output_unigrams' => true,
            'filler_token' => ''
          ],
          'filter_english_stemmer' => [
            'type' => 'stemmer',
            'name' => 'minimal_english'
          ]
        ]
      ]
    ];

Here's the query being built,

               "query" => [
                "bool" => [
                  "must" => [
                    "multi_match" => [
                      "query" => $queryTerm,
                      'type' => 'most_fields',
                      'fields' => [
                        'animal.name.shinglefield',
                        'animal.name.ngramfield',
                      ],
                      'analyzer' => 'stemmer_analyzer'
                    ]
                  ],
                ]
              ],

I tried 3 different scenarios,

  1. filter_english_stemmer as part of filter in shingle_analyzer - With this in place, a search for 'cats' still returns matches for 'cats' based on ngrams which makes sense since ngrams doesn't have filter_english_stemmer filter.
  2. To overcome previous issue, I placed filter_english_stemmer as part of ngram_analyzer filter. But doing so resulted in no matches for 'cats'. Again, this makes sense since there's no shingle/ngram for either 'cats' or 'ats'.
  3. As an alternative approach (and with settings shared above), I used stemmer_analyzer as part of multi_match query (analyzer => 'stemmer_analyzer') and that query gave me similar results for 'cats' and 'cat'.

Now, my question is even though the last approach works but is it a good idea to use analyzer at query time? Is there a better way out?

Also, correct me if I am wrong but the stemmer_analyzer used in multi_match query works since the query 'cats' is reduced to 'cat' and then field specific analyzers are run? Is that how its working?


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.