Filtering data before search

Victor_Dusautois · November 4, 2021, 12:51pm

Hello,
I am trying what I thought was going to be an easy fix with Elasticsearch. It appears I'm stumbling on achieving a sort on a simple name field containing civility:
Here is some sample (fake) data:

[
{
    "name":"Mr and Mrs Smith John"
},
{
    "name":"Mr Doe"
},
{
    "name":"Miss Black and sons"
},
{
    "name":"Mrs White, Snow"
},
{
    "name":"Mrs White, Mary"
}
]

Purpose of the task I'm assigned to is to sort on name regardless of civility, which I thought I could do by using a text field, with a custom analyser with standard tokenize and custom stop filter with various options for civility.
Trouble I'm having is due to the fact that apply a stop filter is returning several tokens, not one.

POST /_analyse

{
    "tokenizer": "standard",
    "filter": [
        {
            "type": "stop",
            "ignore_case": true,
            "stopwords": [
                "mr",
                "mrs",
                "miss"
                "and"
            ]
        }
    ],
    "text": "Mr and Mrs Smith John"
}

produces (rightfully):

{
    "tokens": [
        {
            "token": "Smith",
            "start_offset": 11,
            "end_offset": 16,
            "type": "<ALPHANUM>",
            "position": 3
        },
        {
            "token": "John",
            "start_offset": 17,
            "end_offset": 21,
            "type": "<ALPHANUM>",
            "position": 4
        }
    ]
}

Then applying a sort, this document is retrieved under John, not Smith.
I have been looking for a another filter to combine all tokens into one bu without any luck so far.

I'd appreciate any help or pointer on how to achieve that!
Thanks

dadoonet · November 4, 2021, 1:45pm

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script is something anyone can copy and paste in Kibana dev console, click on the run button to reproduce your use case. It will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

system · December 2, 2021, 1:46pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sort not working as expected Elastic Search	4	124	June 26, 2024
Ignore common terms in field-based query Elasticsearch	2	719	October 11, 2019
Specific stopwords Elasticsearch	6	621	July 5, 2017
The term(s) filter and the standard analyzer Elasticsearch	5	851	July 5, 2017
Sorting on the first non-stop word in Elasticsearch Elasticsearch	2	431	July 6, 2017

Filtering data before search

Related topics