Configuring icu_tokenizer to keep hashtag in token

Robert_Fiser1 · April 25, 2018, 3:38pm

Hi,
we'r using icu_tokenizer to analyze text which may be in many languages. The problem is that text contains hastags like #dog #cat etc. and icu_tokenizer removes the '#' characters from tokens. So we'r not able to find documents which contains exactly the '#cat'.
Is there a simple way to achieve calling _analyze text:'#cat' produces 2 tokens: ['#cat', 'cat']?
Robert

Robert_Fiser1 · May 16, 2018, 12:15pm

Any idea?

system · June 13, 2018, 12:15pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ICU Tokenizer to keep tags and hashtags in token Elasticsearch	1	195	March 19, 2023
Tokenizing a Hashtag containing underscores Elasticsearch	1	1411	July 5, 2017
How to configure Tokenization Dictionary Elasticsearch	1	441	December 15, 2017
Cant check icu_tokenizer Elasticsearch	6	2894	July 6, 2017
Hashtag searches and Japanese full text search Elasticsearch	2	353	December 19, 2023

Configuring icu_tokenizer to keep hashtag in token

Related topics