Elasticsearch: search and index time analyzer

Hi Team

I'm using Custom Analyzer. But, I want to use the analyzer at index time and search time as well. I've mentioned in mappings but I dont see that search time analyzer is working?

Below are settings. I'm using Analyzer for content field. (please refer to content field in mappings)

   "settings": {
    "number_of_shards" : 1,
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "possessive_stemmer",
            "lowercase",
            "english_stop",
            "eng_keywords",
            "stemmer"
          ]
        }
      },
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": ["have","should","i","a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with","my"]
        },
        "stemmer": {
          "type": "stemmer",
          "language": "light_english"
        },
        "possessive_stemmer": {
          "type": "stemmer",
          "language": "possessive_english"
        },
        "eng_keywords": {
          "type": "keyword_marker",
          "keywords": [
            "windows"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
        "properties": {
          "Author": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "CreationDate": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "Creator": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "Keywords": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "ModDate": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "Producer": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "Subject": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "Title": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "content": {
            "type": "text",
            "analyzer": "my_analyzer",
            "search_analyzer": "my_analyzer",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "file_category": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "file_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "url": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
  }

When I search for a query. Example: my monitor is not running.

According to explain api, ES is searching for running instead of run(as I'm using stemmer).

Please let me know what I'm missing here ?

Thanks :slight_smile:

What query are you using? Maybe you can share the exact query here?

If you're using a term query, then no analysis will be applied to your search terms, even if you have specified a search_analyzer (this is the behavior of the term query). In that case, you would need to switch to the match query instead.

hi @abdon. Thanks for connecting

I'm using match query. here is the query

> {      "_source": "url", 
>       "explain": true, 
>     "query": {
>         "match" : {
>             "content" : {
>                 "query" : "my keyboard is not running"
>             }
>         }
>     }
> } 

response:

>   "hits": {
>     "total": 17,
>     "max_score": 2.021533,
>     "hits": [
>       {
>         "_shard": "[newoneindex][0]",
>         "_node": "nTOGuiS3SsGXFeD5Bf3FxQ",
>         "_index": "newoneindex",
>         "_type": "_doc",
>         "_id": "6",
>         "_score": 2.021533,
>         "_source": {
>           "url": "/Linux/linux_faq_4_manual.pdf"
>         },
>         "_explanation": {
>           "value": 2.0215333,
>           "description": "sum of:",
>           "details": [
>             {
>               "value": 1.0470022,
>               "description": "weight(content:keyboard in 2) [PerFieldSimilarity], result of:",
>               "details": [
>                 {
>                   "value": 1.0470022,
>                   "description": "score(doc=2,freq=3.0 = termFreq=3.0\n), product of:",
>                   "details": [
>                     {
>                       "value": 0.6931472,
>                       "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
>                       "details": [
>                         {
>                           "value": 10,
>                           "description": "docFreq",
>                           "details": []
>                         },
>                         {
>                           "value": 20,
>                           "description": "docCount",
>                           "details": []
>                         }
>                       ]
>                     },
>                     {
>                       "value": 1.5105048,
>                       "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
>                       "details": [
>                         {
>                           "value": 3,
>                           "description": "termFreq=3.0",
>                           "details": []
>                         },
>                         {
>                           "value": 1.2,
>                           "description": "parameter k1",
>                           "details": []
>                         },
>                         {
>                           "value": 0.75,
>                           "description": "parameter b",
>                           "details": []
>                         },
>                         {
>                           "value": 4760.05,
>                           "description": "avgFieldLength",
>                           "details": []
>                         },
>                         {
>                           "value": 5656,
>                           "description": "fieldLength",
>                           "details": []
>                         }
>                       ]
>                     }
>                   ]
>                 }
>               ]
>             },
>             {
>               "value": 0.974531,
>               "description": "weight(content:running in 2) [PerFieldSimilarity], result of:",
>               "details": [
>                 {
>                   "value": 0.974531,
>                   "description": "score(doc=2,freq=8.0 = termFreq=8.0\n), product of:",
>                   "details": [
>                     {
>                       "value": 0.5187938,
>                       "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
>                       "details": [
>                         {
>                           "value": 12,
>                           "description": "docFreq",
>                           "details": []
>                         },
>                         {
>                           "value": 20,
>                           "description": "docCount",
>                           "details": []
>                         }
>                       ]
>                     },
>                     {
>                       "value": 1.8784553,
>                       "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
>                       "details": [
>                         {
>                           "value": 8,
>                           "description": "termFreq=8.0",
>                           "details": []
>                         },
>                         {
>                           "value": 1.2,
>                           "description": "parameter k1",
>                           "details": []
>                         },
>                         {
>                           "value": 0.75,
>                           "description": "parameter b",
>                           "details": []
>                         },
>                         {
>                           "value": 4760.05,
>                           "description": "avgFieldLength",
>                           "details": []
>                         },
>                         {
>                           "value": 5656,
>                           "description": "fieldLength",
>                           "details": []
>                         }
>                       ]
>                     }
>                   ]
>                 }
>               ]
>             }
>           ]
>         }
>       }

The light_english stemmer that you're using does not actually stem running to run. You can see that by using the _analyze API:

GET newoneindex/_analyze
{
  "analyzer": "my_analyzer",
  "text": "my keyboard is not running"
}

If you replace the light_english stemmer by for example the english stemmer, you will see that running is actually stemmed to run.

1 Like

@abdon

Yea I missed that. Thank you : )

In addition, any analyzer for handling keywords like haven't, shouldn't, can't etc. All this should be transformed to have not, should not, can not etc.
and should be removed if they are in stopwords list.

-Rahul

I don't know of an easy way to do that. Maybe a mapping character filter could be the way to go?

Maybe you can open a new topic on this forum to give your question some visibility?

I should check mapping character filter

Sure @abdon

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.