Elasticsearch: Testing Analyzers


(Rahul Nama) #1

Hi Team

Can someone please give insights on how to test the Analyzers?

Once we define Analyzers in mappings and started indexing, how should we test the Analyzers? We want to understand the way how each analyzer performs and which analyzer works best for our data.

How to understand this? Please suggest

Thanks for your time as always::slight_smile:

-rahul


(David Pilato) #2

Use the _analyze API


(Rahul Nama) #3

Will do that. @dadoonet

Thank you.

In addition, any insights on difference between adding analyzer in settings and mappings?

You can see it here

In Settings:

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_english_analyzer": {
          "type": "standard",
          "max_token_length": 5,
          "stopwords": "_english_"
        }
      }
    }
  }
} 

In Mappings:

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type":     "text",
          "analyzer": "standard"
        }
      }
    }
  }
}

(Rahul Nama) #4

Hi @dadoonet

As per my understanding settings is for additional configuration of analyzers.

Also, If we need to apply tokenizers we should do that along with analyzers in the settings. Am I right?

And In settings I can see Index_Analyzer and Search_Analyzer. Can you please differentiate both. As per the name I can tell one is analyzed at index time and the other at search time. Is search_analyzer to analyze the query ?

Requirement: Should be apply to Analyzers, tokenizers to improve the relevancy.

Thanks


(David Pilato) #5

If you want to define a custom analyzer, do that in settings.
If you want to apply an analyzer to a field, you do that in mapping.

By default the analyzer is used at index time and search time. But you can optionally specify another analyzer to apply at search time.


(Rahul Nama) #6

got it @dadoonet

Thanks much for your time.

So, I think it's not necessary for us to process/enrich the user query as we are already applying analysers/tokenizer on target fields.

For instance, consider this search term(user query): I have an issue with the internet, how to solve?

do we need to remove stop words even in user query to increase the relevancy? which will convert the above search term to I have issue with internet

does elasticsearch have any features to support query processing?(optimizing the user query to increase the relevancy)


(David Pilato) #7

The default settings in elasticsearch are pretty much good. BM25 algorithm takes care about stop words OOTB.

Just give it a try and you'll see how good it is and if it needs some tuning or not.


(Rahul Nama) #8

Sure @dadoonet

Read about BM25 in documentation but never tried it. Will give it a try

Thank you much :slight_smile:


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.