hi,
I have a requirement to include special characters in search. So for both intake and search, I am trying to create a custom analyzer so that I can retain original character as is. I have set the tokenizer to whitespace and generate_number_parts to false. But when I check the analyzer output, its not working as expected.
PUT /testing
{
"settings": {
"index": {
"analysis": {
"filter": {
"word_delimiter_v3_filter": {
"type": "word_delimiter",
"generate_number_parts ": false,
"split_on_numerics": false,
"preserve_original": true
}
},
"analyzer": {
"searchword_v3_analyzer": {
"filter": [
"lowercase",
"word_delimiter_v3_filter"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
}
}
},
"mappings": {
"testmap": {
"properties": {
"fullname": {
"type": "text",
"analyzer": "searchword_v3_analyzer",
"search_analyzer": "searchword_v3_analyzer"
}
}
}
}
}
Using above analyzer, if I enter "2-10" as search text, I want it to be searched as is.
GET testing/_analyze
{
"analyzer": "searchword_v3_analyzer",
"text": "2-10"
}
But when I check the analyzer output, its still splitting 2 and 10 as separate words, even when I am using white space tokenizer and have set generate_number_parts to false and split_on_numerics to false.
I am running v5.5.1 of ES on Windows 2012.
{
"tokens": [
{
"token": "2-10",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 0
},
{
"token": "2",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "10",
"start_offset": 2,
"end_offset": 4,
"type": "word",
"position": 1
}
]
}
Thanks
askids