Analyzer to keep integers/numeric inputs

(Emin Buğra Saral) #1

How can I create an analyzer to keep integers?

I am using simple analyzer to get tokens, but hello1 becomes [hello] but I wanna keep 1 as well like [hello1] when getting tokenized.

Any way of doing this?

(David Pilato) #2

Does the standard analyzer helps for this?

(Emin Buğra Saral) #3

It works with standard analyzer normally, when using English. But when I use Japanese, standard does not work for me since it converts every Japanese letter to a token. Therefore I chose simple to go with, but now I encountered with this alphanum token issue.

(David Pilato) #4

Did you try the letter Tokenizer?

Not sure it works though for this use case.

May be @johtani has better ideas?

(Emin Buğra Saral) #5

letter tokenizer does not also help with hello123 -> hello123 conversion. It makes it hello123 -> hello.

I tried to use pattern analyzer here with flags UNICODE_CASE|UNICODE_CHARACTER_CLASS and pattern \\W|\\w+ but it does this: Test Contact123 -> [test contact123] where I wanna have [test, contract123]

I was never good at regex, can't find the way to implement this.

(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.