Regex pattern_replace

elink12 · January 7, 2020, 12:22pm

Hi friend, I have a problem. I don’t understand why it works like this?

#My filter pattern
'my_pattern_replace' => [
"type" => "pattern_replace",
"pattern" => "([0-9.,-]+)\s?(car\b|cars\b|cars\w+)",
"replacement" => "$1car"

],

#test against :

i have 2 cars and 2cars

#String replacement result analyze:

William_Brafford · January 7, 2020, 4:44pm

Hello and welcome to the forums!

Your example uses a pattern_replace, which is a token filter. It operates on individual tokens in the text, rather than on the string as a whole. By the time the string i have 2 cars and 2cars gets to the filter, it's been transformed into a stream of tokens: ['i', 'have', '2', 'cars', 'and', '2cars']. The regex gets applied to each of those tokens, and 2 and cars don't match on their own, but the 2cars token does.

If you wanted 2 cars to be its own token, that's something that would be controlled by a "tokenizer" rather than a "token filter." For more details, see anatomy of an analyzer.

-William

elink12 · January 7, 2020, 6:03pm

I found this https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-replace-charfilter.html

it's work

elink12 · January 7, 2020, 6:13pm

but I had another problem, the snowball is used after char_filter so my regular expression does not always work. Are there any solutions?

William_Brafford · January 8, 2020, 3:18pm

There is a lot that you can do with custom analyzers, but I'm not sure if I understand your use case fully. What is the problem you are trying to solve with this regular expression?

-William

system · February 5, 2020, 3:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pattern_replace Token Filter Elasticsearch	1	302	July 6, 2017
Pattern_replace char filter regex Elasticsearch	2	707	June 28, 2017
Pattern_filter for removing dots from a number Elasticsearch	3	1886	November 24, 2017
PatternReplaceFilter behaviour Elasticsearch	2	302	July 6, 2017
Pattern_replace Token Filter and preserve original tokens Elasticsearch	1	10	September 25, 2024

Regex pattern_replace

Related topics