Tokenizer in Elasticsearch

ashit_pupu · February 21, 2017, 5:38pm

The issue is I have to tokenize data into tokens based on spaces at the same time I can't tokenize the data based on special characters. Right now the regex I have is

   (\w*[-*#+=;:\/,~_ ]*\w+)

With this when I process the string

1-CHECK ON BLOCKS BELOW IF MARKET CORRECTION ARE LOADED: PCORP:BLOCK=ANCTRLG&amp;V5PTCLG;   AF55722  BRTBMWA-3289 (AF55722) in block ANCTRLG (Product ID: CAAZ 107 4493 R1A10 )  AF55736  BRTBMWA-3290 (AF55726)in block V5PTCLG  (Product ID: CAAZ 107 4260 R2A08 )  IF MARKET CORRECTIONS ARE LOADED THEN V5 INTERFACE PROPERTY MUST BE DEFINED AS FOLLOW : MUXFIM : ACC-OFF (Accelerate Alligment is not active) WLL    : ACC-ON  (Accelerate Alligment is active ) :  EXAPC:V5ID=v5id,PROP=ACC-OFF;

What it does is tokenizes the string based on spaces at the same time it also tokenizes the data based on special character like

         :  EXAPC:V5ID=v5id is tokenized to :  EXAPC, :V5ID and =v5id rather want it to split as : and EXAPC:V5ID=v5id

I want to avoid this any idea on this any help will be appreciated.

system · March 21, 2017, 5:38pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tokenizing terms with punctuation Elasticsearch	1	244	July 6, 2017
Parse complex search string Elasticsearch	2	274	July 6, 2017
Phrases with special characters Elasticsearch	1	1407	July 6, 2017
Searching for "foo" should also find occurrence of "foo.bar" Elasticsearch	6	484	July 6, 2017
How to generate 2 tokens out of 1 token with special character in Elasticsearch (german language) Elasticsearch	1	443	October 2, 2017

Tokenizer in Elasticsearch

Related topics