Case-insensitive sort


(Phil Hagelberg-2) #1

I'm trying to get some fields to perform case-insensitive sort.

{"index":
{"analysis":
{"analyzer":
{"text":
{"tokenizer": "standard",
"filter":["standard","lowercase"]},
"sortable":
{"tokenizer":"keyword",
"filter":["lowercase"]}}}}}

If I set my field to {"index": "not_analyzed", "analyzer":
"sortable"}, then it sorts in a case-sensitive manner. But if I drop
the "index" setting and use the default ("analyzed"), then it
correctly sorts case-insensitively. This is bewildering to me because
all other lucene-based systems I've worked with have warned that if a
field is analyzed, it can't be used for sorting at all.

So how is it that Elastic Search is able to get around this
limitation? And why does it break when I set it to "not_analyzed"?

thanks,
Phil


(Phil Hagelberg-2) #2

If I set my field to {"index": "not_analyzed", "analyzer":
"sortable"}, then it sorts in a case-sensitive manner. But if I drop
the "index" setting and use the default ("analyzed"), then it
correctly sorts case-insensitively. This is bewildering to me because
all other lucene-based systems I've worked with have warned that if a
field is analyzed, it can't be used for sorting at all.

So I think I might actually have figured it out--since I'm telling it
to use an analyzer which performs no tokenization, it's still able to
perform the sorting. But telling it "no analysis" also means "no
filtering", which means my lowercasing isn't applied. IOW it's not
analysis that interferes with sorting in the first place.

Is that correct?

-Phil


(Shay Banon) #3

Yea, when you set not_analyzed, then the it won't apply an analyzer on it,
regardless which one you configure on it. In this case, you can set it to
analyzed, and keep the sortable analyzer to use.

On Wed, May 26, 2010 at 6:13 AM, Phil Hagelberg phil@hagelb.org wrote:

If I set my field to {"index": "not_analyzed", "analyzer":
"sortable"}, then it sorts in a case-sensitive manner. But if I drop
the "index" setting and use the default ("analyzed"), then it
correctly sorts case-insensitively. This is bewildering to me because
all other lucene-based systems I've worked with have warned that if a
field is analyzed, it can't be used for sorting at all.

So I think I might actually have figured it out--since I'm telling it
to use an analyzer which performs no tokenization, it's still able to
perform the sorting. But telling it "no analysis" also means "no
filtering", which means my lowercasing isn't applied. IOW it's not
analysis that interferes with sorting in the first place.

Is that correct?

-Phil


(system) #4