doaks
(Daniel Oakley)
May 16, 2019, 9:04am
1
According here Word delimiter token filter | Elasticsearch Guide [8.11] | Elastic
catenate_numbers
If true
causes maximum runs of number parts to be catenated: > "500-42" ⇒ "50042". Defaults to false
."
I would like to know if there is a way to include \s+ (any number of spaces) with '-' as characters to collapse when catenating number strings.
Phone numbers are often written like 07 9833-4266 and it would be good if that could be collapsed to a single string 0798334266.
Is there a way?
Hi @doaks ,
It seems catenate_all
would work for you. Please, find the example below:
GET _analyze
{
"text": [
"07 9833-4266",
"+1 (407) 284-1234"
],
"tokenizer": "keyword",
"filter": [
{
"type": "word_delimiter",
"catenate_all": true,
"split_on_case_change": false,
"split_on_numerics": false,
"generate_word_parts": false,
"generate_number_parts": false
}
]
}
The result is:
{
"tokens" : [
{
"token" : "0798334266",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 0
},
{
"token" : "14072841234",
"start_offset" : 14,
"end_offset" : 30,
"type" : "word",
"position" : 1
}
]
}
I hope it helps!
system
(system)
Closed
June 21, 2019, 2:00pm
3
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.