Regex pattern_replace

Hi friend, I have a problem. I don’t understand why it works like this?

#My filter pattern
'my_pattern_replace' => [
"type" => "pattern_replace",
"pattern" => "([0-9.,-]+)\s?(car\b|cars\b|cars\w+)",
"replacement" => "$1car"

                                    ],

#test against :

i have 2 cars and 2cars

#String replacement result analyze:

Hello and welcome to the forums!

Your example uses a pattern_replace, which is a token filter. It operates on individual tokens in the text, rather than on the string as a whole. By the time the string i have 2 cars and 2cars gets to the filter, it's been transformed into a stream of tokens: ['i', 'have', '2', 'cars', 'and', '2cars']. The regex gets applied to each of those tokens, and 2 and cars don't match on their own, but the 2cars token does.

If you wanted 2 cars to be its own token, that's something that would be controlled by a "tokenizer" rather than a "token filter." For more details, see anatomy of an analyzer.

-William

I found this https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-replace-charfilter.html

it's work

but I had another problem, the snowball is used after char_filter so my regular expression does not always work. Are there any solutions?

There is a lot that you can do with custom analyzers, but I'm not sure if I understand your use case fully. What is the problem you are trying to solve with this regular expression?

-William

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.