PatternReplaceCharFilter and Punctuation Characters

learnes · February 16, 2017, 4:30am

Hi,
I am currently using the following mapping
default_index :
type : custom
tokenizer : whitespace
filter : [ word_delimiter, lowercase]
filter :
word_delimiter :
type : word_delimiter
preserve_original : true
split_on_numerics : true
stem_english_possessive : false

My current mapping produces the tokens "abc", "def" and "abc!def" for text abc!def.

I would like to search the punctuation characters !"#$%&'()*+,-./:;<=>?@[]^_`{|}~ also.
i.e. Expected Tokens are "abc", "!", "def", "abc!def" for the text abc!def. The idea is to replace the Punctuation char by WhiteSpace PunctChar WhiteSpace so that it will be preserved as a separate token in index. To achieve this i am trying a char_filter as follows
char_filter :
my_char_filter:
type : pattern_replace
pattern : "(?<=\p{Punct})"
replacement: " Space OriginalPunctuationChar Space "
The pattern tokenizer able to match the punctuation char , But how to get the matched character for replacement. Or any other suggestion to achieve the same ?

system · March 16, 2017, 4:31am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pattern_replace char filter regex Elasticsearch	2	707	June 28, 2017
Pattern Replace Character Filter In a Normalizer Elasticsearch	1	584	May 27, 2021
Solr convert having problems with pattern_replace filter Elasticsearch	2	320	January 25, 2019
Pattern_replace Token Filter Elasticsearch	1	302	July 6, 2017
Non-matched tokens not filtered: pattern_capture with preserve_original: false Elasticsearch	3	1057	July 5, 2017

PatternReplaceCharFilter and Punctuation Characters

Related topics