Default analyzers in elastic search


(srinivas m) #1

I have added the following in my yml file.
index :
analysis :
analyzer :
default :
tokenizer : keyword

But when I am seeing my index metadata at head plugin I am not able to find these index_analyzer and search_analyzer in 2.1.1. I was able to see these two fileds in metadata in the previous version of ES 1.1.1 index.

I am not sure if these analyzers are getting applied on the index while indexing and searching? Is there anyway we can test the custom analyzers created on index??


(Daniel Mitterdorfer) #2

Hi Srinivas,

while it may be possible to specify these settings in the yml file we advise you to define it in the index settings.

Suppose you have an index, called sample_index. Then you can create an analyzer with the keyword tokenizer as follows:

PUT /sample_index
{
   "settings": {
      "analysis": {
         "analyzer": {
            "default": {
               "tokenizer": "keyword"
            }
         }
      }
   }
}

You can test analyzers with the analyze API. Here I analyze the string "This is a test" with the default analyzer for all indices (the "+" sign is just the URL encoding of a space character which is needed because I specify the text as a URL parameter):

GET /_analyze?text=This+is+a+test

which produces:

{
   "tokens": [
      {
         "token": "this",
         "start_offset": 0,
         "end_offset": 4,
         "type": "<ALPHANUM>",
         "position": 0
      },
      {
         "token": "is",
         "start_offset": 5,
         "end_offset": 7,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "a",
         "start_offset": 8,
         "end_offset": 9,
         "type": "<ALPHANUM>",
         "position": 2
      },
      {
         "token": "test",
         "start_offset": 10,
         "end_offset": 14,
         "type": "<ALPHANUM>",
         "position": 3
      }
   ]
}

You can clearly see that the tokens are split on whitespace characters.

If you analyze this text on the sample_index (note that the added index name as first part of the path!):

GET /sample_index/_analyze?text=This+is+a+test

it produces:

{
   "tokens": [
      {
         "token": "This is a test",
         "start_offset": 0,
         "end_offset": 14,
         "type": "word",
         "position": 0
      }
   ]
}

which is due to the keyword tokenizer that I have defined on that index.

So, in summary, don't specify the analyzer in the yml file (reasons: see link at the top) but put it in the index settings. If you create lots of indices, then you can always use index templates

Daniel


(system) #3