Sorting on the first non-stop word in Elasticsearch


(ScottM-2) #1

Apologies if this has been asked before, but various searches didn't
turn up an answer (that I was able to understand).

What I would like to be able to do is to have a document field that is
that can be used to sort on the first non-stop word in the string. The
use case is similar to having a list of musical artist names:

The Shins
The Beatles
Everything but the Girl
Wilco

When sorted this list should be:

The Beatles
Everything but the Girl
The Shins
Wilco

If I use an non-analyzed field the sorting will be:

Everything but the Girl
The Beatles
The Shins
Wilco

If I use an analyzed field (default analyzer) then I can't sort on it.

Does that make sense? I can't find an analyzer in Elasticsearch that
would allow me to do this.


(Jan Fiedler) #2

You need an analyzed field to get rid of the stop words (e.g. a whitespace
tokenizer and a stop word filter). Sorting works on analyzed fields - you
just have to make sure the analyzer generates a single token for your field.
This is exactly the problem in your case - you will end up with multiple
tokes after the stop word removal. I am not aware of a token filter that
would let you merge tokens (after the stop word filter). You may get lucky
with a token filter that creates collator keys (for locale specific sorting

  • like the ICU plugin). However, I have not tested that (I only use it with
    a single token out of the keyword tokenizer) so please do not take my word
    for it.

(system) #3