Configuring icu_tokenizer to keep hashtag in token

(Robert Fišer) #1

we'r using icu_tokenizer to analyze text which may be in many languages. The problem is that text contains hastags like #dog #cat etc. and icu_tokenizer removes the '#' characters from tokens. So we'r not able to find documents which contains exactly the '#cat'.
Is there a simple way to achieve calling _analyze text:'#cat' produces 2 tokens: ['#cat', 'cat']?

(Robert Fišer) #2

Any idea?

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.