Tokenizer underscore support

sharon.c · August 13, 2018, 3:14pm

I am using Elasticsearch 5.1 for logging web logs. Because we use underscore to split the parameters in url, we need the elasticsearch tokenizer to split by underscore too. But default tokenizer of Elasticsearch does not support underscore.

I still need the default tokenizer functionality, and do not want overwrite the default Elasticsearch tokenizer. Would anyone tell me how to add underscore support for the default Elasticsearch tokenizer?

Ivan · August 13, 2018, 6:05pm

Unfortunately, the standard tokenzier is rather complex, so it would be difficult to replicate with the pattern tokenizer plus your modifications.

But you should know your data better than default algorithms. You can create one of the pattern tokenizers matching your data. In order for it to the default, you would need to create a default analyzer, using that tokenizer.

system · September 10, 2018, 6:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Configuring the standard tokenizer elasticsearch Elasticsearch	2	484	October 30, 2018
ES Plugin to extend Lucene's Standard Tokenizer Elasticsearch	5	895	July 6, 2017
Wordpiece tokenizer Elasticsearch	4	546	March 28, 2022
How do I extend the default analyzer? Elasticsearch	2	285	June 7, 2021
Using PatternTokenizer Elasticsearch	5	293	July 6, 2017

Tokenizer underscore support

Related topics