Configuring the standard tokenizer elasticsearch

surajdalvi · October 2, 2018, 5:20pm

Hi,

I am using default tokenizer(standard) for my index in elastic search. and adding documents to it. but standard tokenizer can't split words which having "." dot in it. For example:

POST _analyze
{
  "tokenizer": "standard",
  "text": "pink.jpg"
}

Gives me the response as:

{
  "tokens": [
    {
      "token": "pink.jpg",
      "start_offset": 0,
      "end_offset": 8,
      "type": "<ALPHANUM>",
      "position": 0
    }
  ]
}

The above response showing the whole word in one term. Can we divide it into two terms using "."(dot) operator in standard tokenizer? is any setting in standard tokenizer for this?

jaddison · October 2, 2018, 8:10pm

Use different tokenizers. Look at https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-letter-tokenizer.html and https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-chargroup-tokenizer.html.

The latter might be more appropriate.

system · October 30, 2018, 8:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ElasticSearch standard Analyzer - exceptional case Elasticsearch	10	1059	January 10, 2018
Overwrite Tokenizer of english analyzer Elasticsearch	3	624	July 5, 2017
ElasticSearch standard Analyzer - problematic case Elasticsearch	2	558	July 6, 2017
Standard tokenizer documentation doesn't match behavior Elasticsearch	2	328	July 6, 2017
Searching for "foo" should also find occurrence of "foo.bar" Elasticsearch	6	478	July 6, 2017

Configuring the standard tokenizer elasticsearch

Related topics