Default analyzers in elastic search

srinivas · January 21, 2016, 1:49pm

I have added the following in my yml file.
index :
analysis :
analyzer :
default :
tokenizer : keyword

But when I am seeing my index metadata at head plugin I am not able to find these index_analyzer and search_analyzer in 2.1.1. I was able to see these two fileds in metadata in the previous version of ES 1.1.1 index.

I am not sure if these analyzers are getting applied on the index while indexing and searching? Is there anyway we can test the custom analyzers created on index??

danielmitterdorfer · January 26, 2016, 7:56am

Hi Srinivas,

while it may be possible to specify these settings in the yml file we advise you to define it in the index settings.

Suppose you have an index, called sample_index. Then you can create an analyzer with the keyword tokenizer as follows:

PUT /sample_index
{
   "settings": {
      "analysis": {
         "analyzer": {
            "default": {
               "tokenizer": "keyword"
            }
         }
      }
   }
}

You can test analyzers with the analyze API. Here I analyze the string "This is a test" with the default analyzer for all indices (the "+" sign is just the URL encoding of a space character which is needed because I specify the text as a URL parameter):

GET /_analyze?text=This+is+a+test

which produces:

{
   "tokens": [
      {
         "token": "this",
         "start_offset": 0,
         "end_offset": 4,
         "type": "<ALPHANUM>",
         "position": 0
      },
      {
         "token": "is",
         "start_offset": 5,
         "end_offset": 7,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "a",
         "start_offset": 8,
         "end_offset": 9,
         "type": "<ALPHANUM>",
         "position": 2
      },
      {
         "token": "test",
         "start_offset": 10,
         "end_offset": 14,
         "type": "<ALPHANUM>",
         "position": 3
      }
   ]
}

You can clearly see that the tokens are split on whitespace characters.

If you analyze this text on the sample_index (note that the added index name as first part of the path!):

GET /sample_index/_analyze?text=This+is+a+test

it produces:

{
   "tokens": [
      {
         "token": "This is a test",
         "start_offset": 0,
         "end_offset": 14,
         "type": "word",
         "position": 0
      }
   ]
}

which is due to the keyword tokenizer that I have defined on that index.

So, in summary, don't specify the analyzer in the yml file (reasons: see link at the top) but put it in the index settings. If you create lots of indices, then you can always use index templates

Daniel

Topic		Replies	Views
Index configuration Elasticsearch	16	587	July 6, 2017
Setting up default analyzer using elasticsearch.yml Elasticsearch	6	1896	July 6, 2017
How to defines both the index_analyzer and search_analyzer explicitly? Elasticsearch	3	1435	July 6, 2017
Configuring the default analyzer using the Java API not working Elasticsearch	7	1945	July 6, 2017
How to define the analyzer when creating index? Elasticsearch	2	596	July 6, 2017

Default analyzers in elastic search

Related topics