Token Filter: catenate_numbers - spaces included?

(Daniel Oakley) #1

According here https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html

catenate_numbers
If true causes maximum runs of number parts to be catenated: > "500-42" ⇒ "50042". Defaults to false ."

I would like to know if there is a way to include \s+ (any number of spaces) with '-' as characters to collapse when catenating number strings.

Phone numbers are often written like 07 9833-4266 and it would be good if that could be collapsed to a single string 0798334266.

Is there a way?