Configuring the standard tokenizer elasticsearch


(Suraj) #1

Hi,

I am using default tokenizer(standard) for my index in elastic search. and adding documents to it. but standard tokenizer can't split words which having "." dot in it. For example:

POST _analyze
{
  "tokenizer": "standard",
  "text": "pink.jpg"
}

Gives me the response as:

{
  "tokens": [
    {
      "token": "pink.jpg",
      "start_offset": 0,
      "end_offset": 8,
      "type": "<ALPHANUM>",
      "position": 0
    }
  ]
}

The above response showing the whole word in one term. Can we divide it into two terms using "."(dot) operator in standard tokenizer? is any setting in standard tokenizer for this?


(James Addison) #2

Use different tokenizers. Look at https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-letter-tokenizer.html and https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-chargroup-tokenizer.html.

The latter might be more appropriate.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.