Default analyzers in elastic search

I have added the following in my yml file.
index :
analysis :
analyzer :
default :
tokenizer : keyword

But when I am seeing my index metadata at head plugin I am not able to find these index_analyzer and search_analyzer in 2.1.1. I was able to see these two fileds in metadata in the previous version of ES 1.1.1 index.

I am not sure if these analyzers are getting applied on the index while indexing and searching? Is there anyway we can test the custom analyzers created on index??

Hi Srinivas,

while it may be possible to specify these settings in the yml file we advise you to define it in the index settings.

Suppose you have an index, called sample_index. Then you can create an analyzer with the keyword tokenizer as follows:

PUT /sample_index
{
   "settings": {
      "analysis": {
         "analyzer": {
            "default": {
               "tokenizer": "keyword"
            }
         }
      }
   }
}

You can test analyzers with the analyze API. Here I analyze the string "This is a test" with the default analyzer for all indices (the "+" sign is just the URL encoding of a space character which is needed because I specify the text as a URL parameter):

GET /_analyze?text=This+is+a+test

which produces:

{
   "tokens": [
      {
         "token": "this",
         "start_offset": 0,
         "end_offset": 4,
         "type": "<ALPHANUM>",
         "position": 0
      },
      {
         "token": "is",
         "start_offset": 5,
         "end_offset": 7,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "a",
         "start_offset": 8,
         "end_offset": 9,
         "type": "<ALPHANUM>",
         "position": 2
      },
      {
         "token": "test",
         "start_offset": 10,
         "end_offset": 14,
         "type": "<ALPHANUM>",
         "position": 3
      }
   ]
}

You can clearly see that the tokens are split on whitespace characters.

If you analyze this text on the sample_index (note that the added index name as first part of the path!):

GET /sample_index/_analyze?text=This+is+a+test

it produces:

{
   "tokens": [
      {
         "token": "This is a test",
         "start_offset": 0,
         "end_offset": 14,
         "type": "word",
         "position": 0
      }
   ]
}

which is due to the keyword tokenizer that I have defined on that index.

So, in summary, don't specify the analyzer in the yml file (reasons: see link at the top) but put it in the index settings. If you create lots of indices, then you can always use index templates

Daniel