Hi Srinivas,
while it may be possible to specify these settings in the yml file we advise you to define it in the index settings.
Suppose you have an index, called sample_index
. Then you can create an analyzer with the keyword
tokenizer as follows:
PUT /sample_index
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"tokenizer": "keyword"
}
}
}
}
}
You can test analyzers with the analyze API. Here I analyze the string "This is a test" with the default analyzer for all indices (the "+" sign is just the URL encoding of a space character which is needed because I specify the text as a URL parameter):
GET /_analyze?text=This+is+a+test
which produces:
{
"tokens": [
{
"token": "this",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "is",
"start_offset": 5,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "a",
"start_offset": 8,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "test",
"start_offset": 10,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 3
}
]
}
You can clearly see that the tokens are split on whitespace characters.
If you analyze this text on the sample_index
(note that the added index name as first part of the path!):
GET /sample_index/_analyze?text=This+is+a+test
it produces:
{
"tokens": [
{
"token": "This is a test",
"start_offset": 0,
"end_offset": 14,
"type": "word",
"position": 0
}
]
}
which is due to the keyword tokenizer that I have defined on that index.
So, in summary, don't specify the analyzer in the yml file (reasons: see link at the top) but put it in the index settings. If you create lots of indices, then you can always use index templates
Daniel