Analyzer to keep integers/numeric inputs

ebsaral · October 5, 2017, 11:47am

How can I create an analyzer to keep integers?

I am using simple analyzer to get tokens, but hello1 becomes [hello] but I wanna keep 1 as well like [hello1] when getting tokenized.

Any way of doing this?

dadoonet · October 5, 2017, 12:09pm

Does the standard analyzer helps for this?

ebsaral · October 5, 2017, 12:21pm

It works with standard analyzer normally, when using English. But when I use Japanese, standard does not work for me since it converts every Japanese letter to a token. Therefore I chose simple to go with, but now I encountered with this alphanum token issue.

dadoonet · October 5, 2017, 1:50pm

Did you try the letter Tokenizer? https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-letter-tokenizer.html

Not sure it works though for this use case.

May be @johtani has better ideas?

ebsaral · October 5, 2017, 2:02pm

letter tokenizer does not also help with hello123 -> hello123 conversion. It makes it hello123 -> hello.

I tried to use pattern analyzer here with flags UNICODE_CASE|UNICODE_CHARACTER_CLASS and pattern \\W|\\w+ but it does this: Test Contact123 -> [test contact123] where I wanna have [test, contract123]

I was never good at regex, can't find the way to implement this.

system · November 2, 2017, 2:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Create an analyzer to tokenize non-alphanumeric characters Elasticsearch	7	2302	July 5, 2017
Pattern analyzer regex help Elasticsearch	3	252	August 24, 2022
Filter keep_types not working with pattern tokenizer Elasticsearch	1	327	June 29, 2020
Problems with Tokenization Elasticsearch	3	646	October 26, 2017
Help with custom analyzer/tokenizer Elasticsearch	2	997	July 5, 2017

Analyzer to keep integers/numeric inputs

Related topics