PatternReplaceCharFilter and Punctuation Characters


(EsLearner) #1

Hi,
I am currently using the following mapping
default_index :
type : custom
tokenizer : whitespace
filter : [ word_delimiter, lowercase]
filter :
word_delimiter :
type : word_delimiter
preserve_original : true
split_on_numerics : true
stem_english_possessive : false

My current mapping produces the tokens "abc", "def" and "abc!def" for text abc!def.

I would like to search the punctuation characters !"#$%&'()*+,-./:;<=>?@[]^_`{|}~ also.
i.e. Expected Tokens are "abc", "!", "def", "abc!def" for the text abc!def. The idea is to replace the Punctuation char by WhiteSpace PunctChar WhiteSpace so that it will be preserved as a separate token in index. To achieve this i am trying a char_filter as follows
char_filter :
my_char_filter:
type : pattern_replace
pattern : "(?<=\p{Punct})"
replacement: " Space OriginalPunctuationChar Space "
The pattern tokenizer able to match the punctuation char , But how to get the matched character for replacement. Or any other suggestion to achieve the same ?


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.