Sorting on the first non-stop word in Elasticsearch

ScottM_2 · September 13, 2011, 11:09pm

Apologies if this has been asked before, but various searches didn't
turn up an answer (that I was able to understand).

What I would like to be able to do is to have a document field that is
that can be used to sort on the first non-stop word in the string. The
use case is similar to having a list of musical artist names:

The Shins
The Beatles
Everything but the Girl
Wilco

When sorted this list should be:

The Beatles
Everything but the Girl
The Shins
Wilco

If I use an non-analyzed field the sorting will be:

Everything but the Girl
The Beatles
The Shins
Wilco

If I use an analyzed field (default analyzer) then I can't sort on it.

Does that make sense? I can't find an analyzer in Elasticsearch that
would allow me to do this.

Jan_Fiedler · September 14, 2011, 8:03am

You need an analyzed field to get rid of the stop words (e.g. a whitespace
tokenizer and a stop word filter). Sorting works on analyzed fields - you
just have to make sure the analyzer generates a single token for your field.
This is exactly the problem in your case - you will end up with multiple
tokes after the stop word removal. I am not aware of a token filter that
would let you merge tokens (after the stop word filter). You may get lucky
with a token filter that creates collator keys (for locale specific sorting

like the ICU plugin). However, I have not tested that (I only use it with
a single token out of the keyword tokenizer) so please do not take my word
for it.

Topic		Replies	Views
Sorting on a phrase ignoring stop words Elasticsearch	1	419	July 6, 2017
Filtering data before search Elasticsearch	2	613	December 2, 2021
Looking for a phrase tokenizer or filter like this Elastic Search	4	234	November 2, 2022
Delete me Elasticsearch	2	469	July 5, 2017
Conditional sorting of hits Elasticsearch	1	473	June 3, 2020

Sorting on the first non-stop word in Elasticsearch

Related topics