How can I create an analyzer to keep integers?
I am using simple
analyzer to get tokens, but hello1
becomes [hello
] but I wanna keep 1
as well like [hello1
] when getting tokenized.
Any way of doing this?
How can I create an analyzer to keep integers?
I am using simple
analyzer to get tokens, but hello1
becomes [hello
] but I wanna keep 1
as well like [hello1
] when getting tokenized.
Any way of doing this?
Does the standard
analyzer helps for this?
It works with standard
analyzer normally, when using English. But when I use Japanese, standard does not work for me since it converts every Japanese letter to a token. Therefore I chose simple
to go with, but now I encountered with this alphanum token issue.
Did you try the letter Tokenizer? https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-letter-tokenizer.html
Not sure it works though for this use case.
May be @johtani has better ideas?
letter
tokenizer does not also help with hello123
-> hello123
conversion. It makes it hello123
-> hello
.
I tried to use pattern analyzer here with flags UNICODE_CASE|UNICODE_CHARACTER_CLASS
and pattern \\W|\\w+
but it does this: Test Contact123
-> [test contact123
] where I wanna have [test
, contract123
]
I was never good at regex, can't find the way to implement this.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.