Elasticsearch: search and index time analyzer

rahulnama · January 21, 2019, 10:26am

Hi Team

I'm using Custom Analyzer. But, I want to use the analyzer at index time and search time as well. I've mentioned in mappings but I dont see that search time analyzer is working?

Below are settings. I'm using Analyzer for content field. (please refer to content field in mappings)

   "settings": {
    "number_of_shards" : 1,
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "possessive_stemmer",
            "lowercase",
            "english_stop",
            "eng_keywords",
            "stemmer"
          ]
        }
      },
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": ["have","should","i","a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with","my"]
        },
        "stemmer": {
          "type": "stemmer",
          "language": "light_english"
        },
        "possessive_stemmer": {
          "type": "stemmer",
          "language": "possessive_english"
        },
        "eng_keywords": {
          "type": "keyword_marker",
          "keywords": [
            "windows"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
        "properties": {
          "Author": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "CreationDate": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "Creator": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "Keywords": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "ModDate": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "Producer": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "Subject": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "Title": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "content": {
            "type": "text",
            "analyzer": "my_analyzer",
            "search_analyzer": "my_analyzer",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "file_category": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "file_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "url": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
  }

When I search for a query. Example: my monitor is not running.

According to explain api, ES is searching for running instead of run(as I'm using stemmer).

Please let me know what I'm missing here ?

Thanks

abdon · January 22, 2019, 1:21pm

What query are you using? Maybe you can share the exact query here?

If you're using a term query, then no analysis will be applied to your search terms, even if you have specified a search_analyzer (this is the behavior of the term query). In that case, you would need to switch to the match query instead.

rahulnama · January 23, 2019, 1:04pm

hi @abdon. Thanks for connecting

I'm using match query. here is the query

> {      "_source": "url", 
>       "explain": true, 
>     "query": {
>         "match" : {
>             "content" : {
>                 "query" : "my keyboard is not running"
>             }
>         }
>     }
> }

response:

>   "hits": {
>     "total": 17,
>     "max_score": 2.021533,
>     "hits": [
>       {
>         "_shard": "[newoneindex][0]",
>         "_node": "nTOGuiS3SsGXFeD5Bf3FxQ",
>         "_index": "newoneindex",
>         "_type": "_doc",
>         "_id": "6",
>         "_score": 2.021533,
>         "_source": {
>           "url": "/Linux/linux_faq_4_manual.pdf"
>         },
>         "_explanation": {
>           "value": 2.0215333,
>           "description": "sum of:",
>           "details": [
>             {
>               "value": 1.0470022,
>               "description": "weight(content:keyboard in 2) [PerFieldSimilarity], result of:",
>               "details": [
>                 {
>                   "value": 1.0470022,
>                   "description": "score(doc=2,freq=3.0 = termFreq=3.0\n), product of:",
>                   "details": [
>                     {
>                       "value": 0.6931472,
>                       "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
>                       "details": [
>                         {
>                           "value": 10,
>                           "description": "docFreq",
>                           "details": []
>                         },
>                         {
>                           "value": 20,
>                           "description": "docCount",
>                           "details": []
>                         }
>                       ]
>                     },
>                     {
>                       "value": 1.5105048,
>                       "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
>                       "details": [
>                         {
>                           "value": 3,
>                           "description": "termFreq=3.0",
>                           "details": []
>                         },
>                         {
>                           "value": 1.2,
>                           "description": "parameter k1",
>                           "details": []
>                         },
>                         {
>                           "value": 0.75,
>                           "description": "parameter b",
>                           "details": []
>                         },
>                         {
>                           "value": 4760.05,
>                           "description": "avgFieldLength",
>                           "details": []
>                         },
>                         {
>                           "value": 5656,
>                           "description": "fieldLength",
>                           "details": []
>                         }
>                       ]
>                     }
>                   ]
>                 }
>               ]
>             },
>             {
>               "value": 0.974531,
>               "description": "weight(content:running in 2) [PerFieldSimilarity], result of:",
>               "details": [
>                 {
>                   "value": 0.974531,
>                   "description": "score(doc=2,freq=8.0 = termFreq=8.0\n), product of:",
>                   "details": [
>                     {
>                       "value": 0.5187938,
>                       "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
>                       "details": [
>                         {
>                           "value": 12,
>                           "description": "docFreq",
>                           "details": []
>                         },
>                         {
>                           "value": 20,
>                           "description": "docCount",
>                           "details": []
>                         }
>                       ]
>                     },
>                     {
>                       "value": 1.8784553,
>                       "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
>                       "details": [
>                         {
>                           "value": 8,
>                           "description": "termFreq=8.0",
>                           "details": []
>                         },
>                         {
>                           "value": 1.2,
>                           "description": "parameter k1",
>                           "details": []
>                         },
>                         {
>                           "value": 0.75,
>                           "description": "parameter b",
>                           "details": []
>                         },
>                         {
>                           "value": 4760.05,
>                           "description": "avgFieldLength",
>                           "details": []
>                         },
>                         {
>                           "value": 5656,
>                           "description": "fieldLength",
>                           "details": []
>                         }
>                       ]
>                     }
>                   ]
>                 }
>               ]
>             }
>           ]
>         }
>       }

abdon · January 23, 2019, 4:40pm

The light_english stemmer that you're using does not actually stem running to run. You can see that by using the _analyze API:

GET newoneindex/_analyze
{
  "analyzer": "my_analyzer",
  "text": "my keyboard is not running"
}

If you replace the light_english stemmer by for example the english stemmer, you will see that running is actually stemmed to run.

rahulnama · January 29, 2019, 6:09am

@abdon

Yea I missed that. Thank you : )

In addition, any analyzer for handling keywords like haven't, shouldn't, can't etc. All this should be transformed to have not, should not, can not etc.
and should be removed if they are in stopwords list.

-Rahul

abdon · January 29, 2019, 12:25pm

I don't know of an easy way to do that. Maybe a mapping character filter could be the way to go?

Maybe you can open a new topic on this forum to give your question some visibility?

rahulnama · January 29, 2019, 12:37pm

I should check mapping character filter

Sure @abdon

system · February 26, 2019, 12:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Analyzers in Elastic search Elasticsearch	2	324	July 6, 2017
Custom search analyzer problem Elasticsearch	9	1264	July 5, 2017
Index analyzer config Elasticsearch	1	244	July 6, 2017
Elasticsearch - changing search time analyzer ( query time analyzer) Elasticsearch	4	501	October 19, 2018
Why i can't search my text Elasticsearch	2	373	February 3, 2022

Elasticsearch: search and index time analyzer

Related topics