Tokenizer underscore support

I am using Elasticsearch 5.1 for logging web logs. Because we use underscore to split the parameters in url, we need the elasticsearch tokenizer to split by underscore too. But default tokenizer of Elasticsearch does not support underscore.

I still need the default tokenizer functionality, and do not want overwrite the default Elasticsearch tokenizer. Would anyone tell me how to add underscore support for the default Elasticsearch tokenizer?

Unfortunately, the standard tokenzier is rather complex, so it would be difficult to replicate with the pattern tokenizer plus your modifications.

But you should know your data better than default algorithms. You can create one of the pattern tokenizers matching your data. In order for it to the default, you would need to create a default analyzer, using that tokenizer.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.