Case Insensitive Sort

Hello,

I've been working through implementing a case insensitive sort on a string
field. My initial problem was the inability to sort on an analysed field
as discussed here:

http://blog.wiercinski.net/2011/uncategorized/elasticsearch-sorting-on-string-types-with-more-than-one-value-per-doc-or-more-than-one-token-per-field/

With the sort field in question now defined as a multi_field, I'm in
business:

"last_name": {
"type": "multi_field",
"fields": {
"last_name": {
"type": "string",
"boost": 2,
"analyzer": "name"
},
"untouched": {
"type": "string",
"index": "not_analyzed",
"include_in_all": false
}
}
}

However, when sorting on last_name.untouched, the sort applied is case
sensitive. I seem to be missing out on a solution that seem to be hinted
at here:

http://elasticsearch-users.115913.n3.nabble.com/multi-field-and-sort-td3548822.html
http://elasticsearch-users.115913.n3.nabble.com/Case-insensitive-sort-td843856.html

Is the approach taken with the multi_field wrong, or is there another layer
to solving this problem that I am missing?

Thanks!

Mike

Hi Mike,

The second solution seems to be the obvious solution to the issue.
Using a keyword tokenizer will only create one token for the entire
field, adhering to the "one-token-per-field" requirement. Other
tokenizers will create one or more (synonyms) tokens per word (except
stopwords). Applying a lower-case filter will provide case
insensitivity, keeping in mind that the same analyzer must be used for
both indexing and querying.

Ivan

On Mon, Aug 6, 2012 at 7:52 AM, Michael Caplan michael@eggplant.ws wrote:

Hello,

I've been working through implementing a case insensitive sort on a string
field. My initial problem was the inability to sort on an analysed field as
discussed here:

http://blog.wiercinski.net/2011/uncategorized/elasticsearch-sorting-on-string-types-with-more-than-one-value-per-doc-or-more-than-one-token-per-field/

With the sort field in question now defined as a multi_field, I'm in
business:

"last_name": {
"type": "multi_field",
"fields": {
"last_name": {
"type": "string",
"boost": 2,
"analyzer": "name"
},
"untouched": {
"type": "string",
"index": "not_analyzed",
"include_in_all": false
}
}
}

However, when sorting on last_name.untouched, the sort applied is case
sensitive. I seem to be missing out on a solution that seem to be hinted at
here:

http://elasticsearch-users.115913.n3.nabble.com/multi-field-and-sort-td3548822.html
http://elasticsearch-users.115913.n3.nabble.com/Case-insensitive-sort-td843856.html

Is the approach taken with the multi_field wrong, or is there another layer
to solving this problem that I am missing?

Thanks!

Mike

Hi Ivan,

Just to make sure I have this right, to fulfill sorting requirements any
tokenizer used on string fields need to produce one token and the keyword
tokenizer meets this requirement. Besides the keyword tokenizer, am I
correct with my understanding that there are no other tokenizers that meet
this requirement?

Thanks!

Mike

On Monday, August 6, 2012 12:03:58 PM UTC-3, Ivan Brusic wrote:

Hi Mike,

The second solution seems to be the obvious solution to the issue.
Using a keyword tokenizer will only create one token for the entire
field, adhering to the "one-token-per-field" requirement. Other
tokenizers will create one or more (synonyms) tokens per word (except
stopwords). Applying a lower-case filter will provide case
insensitivity, keeping in mind that the same analyzer must be used for
both indexing and querying.

Ivan

On Mon, Aug 6, 2012 at 7:52 AM, Michael Caplan <> wrote:

Hello,

I've been working through implementing a case insensitive sort on a
string
field. My initial problem was the inability to sort on an analysed
field as
discussed here:

http://blog.wiercinski.net/2011/uncategorized/elasticsearch-sorting-on-string-types-with-more-than-one-value-per-doc-or-more-than-one-token-per-field/

With the sort field in question now defined as a multi_field, I'm in
business:

"last_name": {
"type": "multi_field",
"fields": {
"last_name": {
"type": "string",
"boost": 2,
"analyzer": "name"
},
"untouched": {
"type": "string",
"index": "not_analyzed",
"include_in_all": false
}
}
}

However, when sorting on last_name.untouched, the sort applied is case
sensitive. I seem to be missing out on a solution that seem to be
hinted at
here:

http://elasticsearch-users.115913.n3.nabble.com/multi-field-and-sort-td3548822.html

http://elasticsearch-users.115913.n3.nabble.com/Case-insensitive-sort-td843856.html

Is the approach taken with the multi_field wrong, or is there another
layer
to solving this problem that I am missing?

Thanks!

Mike

Correct. Only the keyword tokenizer (to my knowledge) can guarantee
one token per term. Of course, other analyzers could create only one
token such as the whitespace token and terms with no whitespace (state
abbreviations, zip codes, etc..).

--
Ivan

On Mon, Aug 6, 2012 at 3:24 PM, Michael Caplan michael@eggplant.ws wrote:

Hi Ivan,

Just to make sure I have this right, to fulfill sorting requirements any
tokenizer used on string fields need to produce one token and the keyword
tokenizer meets this requirement. Besides the keyword tokenizer, am I
correct with my understanding that there are no other tokenizers that meet
this requirement?

Thanks!

Mike

On Monday, August 6, 2012 12:03:58 PM UTC-3, Ivan Brusic wrote:

Hi Mike,

The second solution seems to be the obvious solution to the issue.
Using a keyword tokenizer will only create one token for the entire
field, adhering to the "one-token-per-field" requirement. Other
tokenizers will create one or more (synonyms) tokens per word (except
stopwords). Applying a lower-case filter will provide case
insensitivity, keeping in mind that the same analyzer must be used for
both indexing and querying.

Ivan

On Mon, Aug 6, 2012 at 7:52 AM, Michael Caplan <> wrote:

Hello,

I've been working through implementing a case insensitive sort on a
string
field. My initial problem was the inability to sort on an analysed
field as
discussed here:

http://blog.wiercinski.net/2011/uncategorized/elasticsearch-sorting-on-string-types-with-more-than-one-value-per-doc-or-more-than-one-token-per-field/

With the sort field in question now defined as a multi_field, I'm in
business:

"last_name": {
"type": "multi_field",
"fields": {
"last_name": {
"type": "string",
"boost": 2,
"analyzer": "name"
},
"untouched": {
"type": "string",
"index": "not_analyzed",
"include_in_all": false
}
}
}

However, when sorting on last_name.untouched, the sort applied is case
sensitive. I seem to be missing out on a solution that seem to be
hinted at
here:

http://elasticsearch-users.115913.n3.nabble.com/multi-field-and-sort-td3548822.html

http://elasticsearch-users.115913.n3.nabble.com/Case-insensitive-sort-td843856.html

Is the approach taken with the multi_field wrong, or is there another
layer
to solving this problem that I am missing?

Thanks!

Mike

I had a hard time using with this as well, as I needed to display
properly-capitalized words, but sort them in a case-insensitive fashion.
One solution that worked for me was to have the fields indexes as
"not_analyzed", then use this in my query:

{
"query": {...},
"sort": [
{"_script": {
"script": "doc['field_name_to_sort_by'].value.toLowerCase()",
"type": "string",
"order": "asc"
}}
]
}

On Monday, August 6, 2012 10:52:04 AM UTC-4, Michael Caplan wrote:

Hello,

I've been working through implementing a case insensitive sort on a string
field. My initial problem was the inability to sort on an analysed field
as discussed here:

http://blog.wiercinski.net/2011/uncategorized/elasticsearch-sorting-on-string-types-with-more-than-one-value-per-doc-or-more-than-one-token-per-field/

With the sort field in question now defined as a multi_field, I'm in
business:

"last_name": {
"type": "multi_field",
"fields": {
"last_name": {
"type": "string",
"boost": 2,
"analyzer": "name"
},
"untouched": {
"type": "string",
"index": "not_analyzed",
"include_in_all": false
}
}
}

However, when sorting on last_name.untouched, the sort applied is case
sensitive. I seem to be missing out on a solution that seem to be hinted
at here:

http://elasticsearch-users.115913.n3.nabble.com/multi-field-and-sort-td3548822.html

http://elasticsearch-users.115913.n3.nabble.com/Case-insensitive-sort-td843856.html

Is the approach taken with the multi_field wrong, or is there another
layer to solving this problem that I am missing?

Thanks!

Mike

--

vkareh's solution worked for me.

On Friday, August 10, 2012 12:44:31 PM UTC-5, vkareh wrote:

I had a hard time using with this as well, as I needed to display
properly-capitalized words, but sort them in a case-insensitive fashion.
One solution that worked for me was to have the fields indexes as
"not_analyzed", then use this in my query:

{
"query": {...},
"sort": [
{"_script": {
"script": "doc['field_name_to_sort_by'].value.toLowerCase()",
"type": "string",
"order": "asc"
}}
]
}

On Monday, August 6, 2012 10:52:04 AM UTC-4, Michael Caplan wrote:

Hello,

I've been working through implementing a case insensitive sort on a
string field. My initial problem was the inability to sort on
an analysed field as discussed here:

http://blog.wiercinski.net/2011/uncategorized/elasticsearch-sorting-on-string-types-with-more-than-one-value-per-doc-or-more-than-one-token-per-field/

With the sort field in question now defined as a multi_field, I'm in
business:

"last_name": {
"type": "multi_field",
"fields": {
"last_name": {
"type": "string",
"boost": 2,
"analyzer": "name"
},
"untouched": {
"type": "string",
"index": "not_analyzed",
"include_in_all": false
}
}
}

However, when sorting on last_name.untouched, the sort applied is case
sensitive. I seem to be missing out on a solution that seem to be hinted
at here:

http://elasticsearch-users.115913.n3.nabble.com/multi-field-and-sort-td3548822.html

http://elasticsearch-users.115913.n3.nabble.com/Case-insensitive-sort-td843856.html

Is the approach taken with the multi_field wrong, or is there another
layer to solving this problem that I am missing?

Thanks!

Mike

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks you very much... Its working...