Hello,
I am trying what I thought was going to be an easy fix with Elasticsearch. It appears I'm stumbling on achieving a sort on a simple name field containing civility:
Here is some sample (fake) data:
[
{
"name":"Mr and Mrs Smith John"
},
{
"name":"Mr Doe"
},
{
"name":"Miss Black and sons"
},
{
"name":"Mrs White, Snow"
},
{
"name":"Mrs White, Mary"
}
]
Purpose of the task I'm assigned to is to sort on name regardless of civility, which I thought I could do by using a text field, with a custom analyser with standard tokenize and custom stop filter with various options for civility.
Trouble I'm having is due to the fact that apply a stop filter is returning several tokens, not one.
POST /_analyse
{
"tokenizer": "standard",
"filter": [
{
"type": "stop",
"ignore_case": true,
"stopwords": [
"mr",
"mrs",
"miss"
"and"
]
}
],
"text": "Mr and Mrs Smith John"
}
produces (rightfully):
{
"tokens": [
{
"token": "Smith",
"start_offset": 11,
"end_offset": 16,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "John",
"start_offset": 17,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 4
}
]
}
Then applying a sort, this document is retrieved under John, not Smith.
I have been looking for a another filter to combine all tokens into one bu without any luck so far.
I'd appreciate any help or pointer on how to achieve that!
Thanks