Analyzer to keep integers/numeric inputs


(Emin Buğra Saral) #1

How can I create an analyzer to keep integers?

I am using simple analyzer to get tokens, but hello1 becomes [hello] but I wanna keep 1 as well like [hello1] when getting tokenized.

Any way of doing this?


(David Pilato) #2

Does the standard analyzer helps for this?


(Emin Buğra Saral) #3

It works with standard analyzer normally, when using English. But when I use Japanese, standard does not work for me since it converts every Japanese letter to a token. Therefore I chose simple to go with, but now I encountered with this alphanum token issue.


(David Pilato) #4

Did you try the letter Tokenizer? https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-letter-tokenizer.html

Not sure it works though for this use case.

May be @johtani has better ideas?


(Emin Buğra Saral) #5

letter tokenizer does not also help with hello123 -> hello123 conversion. It makes it hello123 -> hello.

I tried to use pattern analyzer here with flags UNICODE_CASE|UNICODE_CHARACTER_CLASS and pattern \\W|\\w+ but it does this: Test Contact123 -> [test contact123] where I wanna have [test, contract123]

I was never good at regex, can't find the way to implement this.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.