Support for case insensitive sorts with doc values

Hugues_Malphettes · October 7, 2014, 7:19am

Hi everyone,

Case insensitive sort is elegantly supported by using a custom analyzer [1].
doc values are documented as a great fit for sorting [2] to save heap
memory.

However doc values are not support for analyzed strings at the moment.

Are we planning to support doc values for analyzers that emit a single
token per string?
Is it worth it to have the ES client do the lower-casing and collation
itself?

Thanks!
Hugues

[1] http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/sorting-collations.html#case-insensitive-sorting
[2] http://www.elasticsearch.org/blog/elasticsearch-1-4-0-beta-released/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · October 7, 2014, 9:59am

Hi Hugues,

For now the work-around would indeed be to do the work on client-side
before the data gets into elasticsearch (or potentially using the _source
transform[1] feature).

[1]

On Tue, Oct 7, 2014 at 9:19 AM, Hugues Malphettes hmalphettes@gmail.com
wrote:

Hi everyone,

Case insensitive sort is elegantly supported by using a custom analyzer
[1].
doc values are documented as a great fit for sorting [2] to save heap
memory.

However doc values are not support for analyzed strings at the moment.

Are we planning to support doc values for analyzers that emit a single
token per string?
Is it worth it to have the ES client do the lower-casing and collation
itself?

Thanks!
Hugues

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[2] Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7-6D%2BC-mZyxDsHSey%2BQwbwJ0bLW6OH2BCCRqQsXv1VXw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hugues_Malphettes · October 7, 2014, 10:11am

Thanks Adrian,

I'll give a shot at the source transform then.

If you consider that it makes sense to support this, would it be helpful to
file an enhancement request on github?
Give us a hint if you think it can be done by an occasional contributor

Cheers,
Hugues

On Tuesday, 7 October 2014 17:59:29 UTC+8, Adrien Grand wrote:

Hi Hugues,

For now the work-around would indeed be to do the work on client-side
before the data gets into elasticsearch (or potentially using the _source
transform[1] feature).

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Tue, Oct 7, 2014 at 9:19 AM, Hugues Malphettes <hmalp...@gmail.com
<javascript:>> wrote:

Hi everyone,

Case insensitive sort is elegantly supported by using a custom analyzer
[1].
doc values are documented as a great fit for sorting [2] to save heap
memory.

However doc values are not support for analyzed strings at the moment.

Are we planning to support doc values for analyzers that emit a single
token per string?
Is it worth it to have the ES client do the lower-casing and collation
itself?

Thanks!
Hugues

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[2] Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/94d2561a-13de-4161-a31a-ca206a91e7b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hugues_Malphettes · November 14, 2014, 1:25am

Hi Adrian and everyone,

I gave a shot at a extending the 'string' type to add another analyzer:

gist.github.com

https://gist.github.com/hmalphettes/b402d72230e9009f960c

DVStringFieldMapper.java

/**
 * License: same than Elasticsearch: ASL-2.0.
 * Source: copy and paste bits and pieces from the original Elasticsearch StringFieldMapper.
 */
package org.elasticsearch.docvalues.exporter;

import static org.elasticsearch.index.mapper.core.TypeParsers.parseField;
import static org.elasticsearch.index.mapper.core.TypeParsers.parseMultiField;

import java.io.IOException;

This file has been truncated. show original

The parameter "index_docvalues_analyzer" when present on the mapping
definition will generate a Token Stream and the first token is stored as a
SortedSetDocValuesField.

It works for me. WOuld it be interesting to make this part of the standard
StringFieldMapper?

Cheers!
Hugues

On Tuesday, 7 October 2014 18:11:18 UTC+8, Hugues Malphettes wrote:

Thanks Adrian,

I'll give a shot at the source transform then.

If you consider that it makes sense to support this, would it be helpful
to file an enhancement request on github?
Give us a hint if you think it can be done by an occasional contributor

Cheers,
Hugues

On Tuesday, 7 October 2014 17:59:29 UTC+8, Adrien Grand wrote:

Hi Hugues,

For now the work-around would indeed be to do the work on client-side
before the data gets into elasticsearch (or potentially using the _source
transform[1] feature).

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Tue, Oct 7, 2014 at 9:19 AM, Hugues Malphettes hmalp...@gmail.com
wrote:

Hi everyone,

Case insensitive sort is elegantly supported by using a custom analyzer
[1].
doc values are documented as a great fit for sorting [2] to save heap
memory.

However doc values are not support for analyzed strings at the moment.

Are we planning to support doc values for analyzers that emit a single
token per string?
Is it worth it to have the ES client do the lower-casing and collation
itself?

Thanks!
Hugues

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[2] Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cb25e585-32b0-48fd-b5f6-4bede21f5864%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

GeetNair · February 4, 2015, 4:17pm

Hi Hugues,

So you have extended "String" type to add another analyzer.

Is there any way to use script/transform the source and then apply sort on it? If yes will you please share the same.

As mentioned by Adrien, is there work-around on client-side before the data gets into elasticsearch using some native / groovy script?

All I want is following

One option is to lower case the field values for sorting but it may not work for all locales.
Then we have ICU plugin which helps us achieve this to some extent.
However, the problem now is that we are trying to use the doc_values = true option in mapping but this cannot be used for string fields having analyzer.
So if we need to use ICU plugin then we cannot use doc_value option.
5.Other way is to use the ICU plugin as a library i.e. we call some API in that plugin which converts our field into required format for sorting.

So is there a way to call some API or transform input using script ?

Regards,
Angie

Geetanjali_Paygude · February 6, 2015, 4:17am

Hi Hugues,

So you have extended "String" type to add custom analyzer.

I am referring to this thread
http://elasticsearch-users.115913.n3.nabble.com/Support-for-case-insensitive-sorts-with-doc-values-tt4064487.html

Is there any way to use script/transform the source and then apply sort on
it? If yes will you please share the same.

As mentioned by Adrien, is there work-around on client-side before the data
gets into elasticsearch using some native / groovy script?

All I want is following

We have ICU plugin which helps us achieve custom sorting to some extent.
However, the problem now is that we are trying to use the doc_values =
true option in mapping but this cannot be used for string fields having
analyzer.
So if we need to use ICU plugin then we cannot use doc_value option.
5.Other way is to use the ICU plugin as a library i.e. we call some API in
that plugin which converts our field into required format for sorting.

So is there a way to call some API or transform input using script ?

OR If I use your analyzer in a native script, how to invoke the same from
mappings. Please provide usage example

Thanks,
Angie

On Friday, 14 November 2014 06:55:36 UTC+5:30, Hugues Malphettes wrote:

Hi Adrian and everyone,

I gave a shot at a extending the 'string' type to add another analyzer:
Extended StringFieldMapper to have a docvalues on an analyzed string · GitHub

The parameter "index_docvalues_analyzer" when present on the mapping
definition will generate a Token Stream and the first token is stored as a
SortedSetDocValuesField.

It works for me. WOuld it be interesting to make this part of the standard
StringFieldMapper?

Cheers!
Hugues

On Tuesday, 7 October 2014 18:11:18 UTC+8, Hugues Malphettes wrote:

Thanks Adrian,

I'll give a shot at the source transform then.

If you consider that it makes sense to support this, would it be helpful
to file an enhancement request on github?
Give us a hint if you think it can be done by an occasional contributor

Cheers,
Hugues

On Tuesday, 7 October 2014 17:59:29 UTC+8, Adrien Grand wrote:

Hi Hugues,

For now the work-around would indeed be to do the work on client-side
before the data gets into elasticsearch (or potentially using the _source
transform[1] feature).

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Tue, Oct 7, 2014 at 9:19 AM, Hugues Malphettes hmalp...@gmail.com
wrote:

Hi everyone,

Case insensitive sort is elegantly supported by using a custom analyzer
[1].
doc values are documented as a great fit for sorting [2] to save heap
memory.

However doc values are not support for analyzed strings at the moment.

Are we planning to support doc values for analyzers that emit a single
token per string?
Is it worth it to have the ES client do the lower-casing and collation
itself?

Thanks!
Hugues

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[2]
Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4b610512-5af2-4493-ae72-2afac2871f5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hugues_Malphettes · February 6, 2015, 6:10am

Hi Angie,

On Friday, 6 February 2015 12:17:47 UTC+8, Geetanjali Paygude wrote:

Hi Hugues,

So you have extended "String" type to add custom analyzer.

I am referring to this thread

http://elasticsearch-users.115913.n3.nabble.com/Support-for-case-insensitive-sorts-with-doc-values-tt4064487.html

Is there any way to use script/transform the source and then apply sort on
it? If yes will you please share the same.

As mentioned by Adrien, is there work-around on client-side before the
data gets into elasticsearch using some native / groovy script?

I believe Adrien suggested this procedure:

create a second field specifically to store the value as a
docvalue/not_analyzed string
on the client-side analyze the string yourself
add the new value as a separate field in the document you index
"profit": use that new field for sorting and other queries

A variation of this consists of delegating the generation of the second
field's value to a _source transform.

create the same second field: docvalues-not_analyzed
define a source transform for the affected type of document
in the script of the source transform apply the transformation you need
"profit"
You are saving some bandwidth, the _source of your document will never show
the second value and the impact on your client code is limited to the
queries.
ES will work more and the transform you can do in the script might be
limited.

All I want is following

We have ICU plugin which helps us achieve custom sorting to some
extent.

However, the problem now is that we are trying to use the doc_values =
true option in mapping but this cannot be used for string fields having
analyzer.

So if we need to use ICU plugin then we cannot use doc_value option.
5.Other way is to use the ICU plugin as a library i.e. we call some API in
that plugin which converts our field into required format for sorting.

So is there a way to call some API or transform input using script ?

I suspect it might be difficult to invoke the ICU transformation via a
groovy script.
You could make it work with a native script written in java.

OR If I use your analyzer in a native script, how to invoke the same from
mappings. Please provide usage example

My code snippet is in fact a new mapping type; not an analyzer.
It is more or less a fork of the original string mapping as defined inside
Elasticsearch.

I have packaged this new mapping type in a plugin here:

It is a work in progress. Help is welcome if it is useful for you.

I hope this helps.
Let us know,
Hugues

Thanks,
Angie

On Friday, 14 November 2014 06:55:36 UTC+5:30, Hugues Malphettes wrote:

Hi Adrian and everyone,

I gave a shot at a extending the 'string' type to add another analyzer:
Extended StringFieldMapper to have a docvalues on an analyzed string · GitHub

The parameter "index_docvalues_analyzer" when present on the mapping
definition will generate a Token Stream and the first token is stored as a
SortedSetDocValuesField.

It works for me. WOuld it be interesting to make this part of the
standard StringFieldMapper?

Cheers!
Hugues

On Tuesday, 7 October 2014 18:11:18 UTC+8, Hugues Malphettes wrote:

Thanks Adrian,

I'll give a shot at the source transform then.

If you consider that it makes sense to support this, would it be helpful
to file an enhancement request on github?
Give us a hint if you think it can be done by an occasional contributor

Cheers,
Hugues

On Tuesday, 7 October 2014 17:59:29 UTC+8, Adrien Grand wrote:

Hi Hugues,

For now the work-around would indeed be to do the work on client-side
before the data gets into elasticsearch (or potentially using the _source
transform[1] feature).

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Tue, Oct 7, 2014 at 9:19 AM, Hugues Malphettes hmalp...@gmail.com
wrote:

Hi everyone,

Case insensitive sort is elegantly supported by using a custom
analyzer [1].
doc values are documented as a great fit for sorting [2] to save
heap memory.

However doc values are not support for analyzed strings at the moment.

Are we planning to support doc values for analyzers that emit a single
token per string?
Is it worth it to have the ES client do the lower-casing and collation
itself?

Thanks!
Hugues

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[2]
Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a9d415bd-b8c6-4d5c-80e6-70b7676eb6b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Geetanjali_Paygude · February 10, 2015, 8:18am

Thanks Hugues,

This is helpful !! However I was trying to write Java native script for
sorting values using ICUCollation.
Please find attached JAR . This script works fine. However it gives
incorrect sorting result.

Please find below code snippet

Script is as follows:

PUT /custom1_index

{

"my_type": 

{"properties" :

  { "LastName" :

    {

    "type": "string", 

    "index":"not-anlayzed"

   }

  }  

}

}

PUT /custom1_index/my_type/1

{ "LastName": "AAP"

}

PUT /custom1_index/my_type/2

{ "LastName": "zara"

}

PUT /custom1_index/my_type/3

{ "LastName": "beta"

}

GET /custom1_index/_search

{

"script_fields": {

    "sort": {

        "script": "ICUSortingScriptFilter",

        "lang": "native",

        "params": {

            "field": “LastName"

            

        }

    },

    "type": "string"

 

}

}
Please let me know if any correction is required in this script.

Regards,
Angie

On Friday, 6 February 2015 11:40:53 UTC+5:30, Hugues Malphettes wrote:

Hi Angie,

On Friday, 6 February 2015 12:17:47 UTC+8, Geetanjali Paygude wrote:

Hi Hugues,

So you have extended "String" type to add custom analyzer.

I am referring to this thread

http://elasticsearch-users.115913.n3.nabble.com/Support-for-case-insensitive-sorts-with-doc-values-tt4064487.html

Is there any way to use script/transform the source and then apply sort
on it? If yes will you please share the same.

As mentioned by Adrien, is there work-around on client-side before the
data gets into elasticsearch using some native / groovy script?

I believe Adrien suggested this procedure:

create a second field specifically to store the value as a
docvalue/not_analyzed string

on the client-side analyze the string yourself

add the new value as a separate field in the document you index

"profit": use that new field for sorting and other queries

A variation of this consists of delegating the generation of the second
field's value to a _source transform.

create the same second field: docvalues-not_analyzed

define a source transform for the affected type of document

in the script of the source transform apply the transformation you need

"profit"
You are saving some bandwidth, the _source of your document will never
show the second value and the impact on your client code is limited to the
queries.
ES will work more and the transform you can do in the script might be
limited.

All I want is following

We have ICU plugin which helps us achieve custom sorting to some
extent.

However, the problem now is that we are trying to use the doc_values =
true option in mapping but this cannot be used for string fields having
analyzer.

So if we need to use ICU plugin then we cannot use doc_value option.
5.Other way is to use the ICU plugin as a library i.e. we call some API
in that plugin which converts our field into required format for sorting.

So is there a way to call some API or transform input using script ?

I suspect it might be difficult to invoke the ICU transformation via a
groovy script.
You could make it work with a native script written in java.

OR If I use your analyzer in a native script, how to invoke the same from
mappings. Please provide usage example

My code snippet is in fact a new mapping type; not an analyzer.
It is more or less a fork of the original string mapping as defined inside
Elasticsearch.

I have packaged this new mapping type in a plugin here:
GitHub - hmalphettes/elasticsearch-docvalues-string: sortable elasticsearch strings indexed as docvalues

It is a work in progress. Help is welcome if it is useful for you.

I hope this helps.
Let us know,
Hugues

Thanks,
Angie

On Friday, 14 November 2014 06:55:36 UTC+5:30, Hugues Malphettes wrote:

Hi Adrian and everyone,

I gave a shot at a extending the 'string' type to add another analyzer:
Extended StringFieldMapper to have a docvalues on an analyzed string · GitHub

The parameter "index_docvalues_analyzer" when present on the mapping
definition will generate a Token Stream and the first token is stored as a
SortedSetDocValuesField.

It works for me. WOuld it be interesting to make this part of the
standard StringFieldMapper?

Cheers!
Hugues

On Tuesday, 7 October 2014 18:11:18 UTC+8, Hugues Malphettes wrote:

Thanks Adrian,

I'll give a shot at the source transform then.

If you consider that it makes sense to support this, would it be
helpful to file an enhancement request on github?
Give us a hint if you think it can be done by an occasional contributor

Cheers,
Hugues

On Tuesday, 7 October 2014 17:59:29 UTC+8, Adrien Grand wrote:

Hi Hugues,

For now the work-around would indeed be to do the work on client-side
before the data gets into elasticsearch (or potentially using the _source
transform[1] feature).

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Tue, Oct 7, 2014 at 9:19 AM, Hugues Malphettes hmalp...@gmail.com
wrote:

Hi everyone,

Case insensitive sort is elegantly supported by using a custom
analyzer [1].
doc values are documented as a great fit for sorting [2] to save
heap memory.

However doc values are not support for analyzed strings at the moment.

Are we planning to support doc values for analyzers that emit a
single token per string?
Is it worth it to have the ES client do the lower-casing and
collation itself?

Thanks!
Hugues

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[2]
Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d5fba22-1f19-48b0-bce7-062cad407c01%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Geetanjali_Paygude · February 10, 2015, 10:40am

And it fails for following script

GET /custom10_index/_search
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"script": "ICUSortingScriptFilter",
"params": {
"field": "LastName"
},
"lang": "native",
"type": "string"
}
}
}

Result:
{
"took": 92,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 4,
"failed": 1,
"failures": [
{
"index": "custom10_index",
"shard": 2,
"status": 500,
"reason":
"RemoteTransportException[[Sise-Neg][inet[/10.211.242.237:9301]][indices:data/read/search[phase/query]]];
nested: QueryPhaseExecutionException[[custom10_index][2]:
query[ConstantScore(:)],from[0],size[10],sort[<custom:"_script":
org.elasticsearch.search.sort.ScriptSortParser$1@89da6f0>]: Query Failed
[Failed to execute main query]]; nested: UnsupportedOperationException; "
}
]
},
"hits": {
"total": 0,
"max_score": null,
"hits":
}
}

Please let me know if there is a way to sort using custom native java
script in elasticsearch?

On Tuesday, 10 February 2015 13:48:23 UTC+5:30, Geetanjali Paygude wrote:

Thanks Hugues,

This is helpful !! However I was trying to write Java native script for
sorting values using ICUCollation.
Please find attached JAR . This script works fine. However it gives
incorrect sorting result.

Please find below code snippet

Script is as follows:

PUT /custom1_index

{
"my_type": 

{"properties" :

  { "LastName" :

    {

    "type": "string", 

    "index":"not-anlayzed"

   }

  }  

}
}

PUT /custom1_index/my_type/1

{ "LastName": "AAP"

}

PUT /custom1_index/my_type/2

{ "LastName": "zara"

}

PUT /custom1_index/my_type/3

{ "LastName": "beta"

}

GET /custom1_index/_search

{
"script_fields": {

    "sort": {

        "script": "ICUSortingScriptFilter",

        "lang": "native",

        "params": {

            "field": “LastName"

            

        }

    },

    "type": "string"

 

}
}
Please let me know if any correction is required in this script.

Regards,
Angie

On Friday, 6 February 2015 11:40:53 UTC+5:30, Hugues Malphettes wrote:

Hi Angie,

On Friday, 6 February 2015 12:17:47 UTC+8, Geetanjali Paygude wrote:

Hi Hugues,

So you have extended "String" type to add custom analyzer.

I am referring to this thread

http://elasticsearch-users.115913.n3.nabble.com/Support-for-case-insensitive-sorts-with-doc-values-tt4064487.html

Is there any way to use script/transform the source and then apply sort
on it? If yes will you please share the same.

As mentioned by Adrien, is there work-around on client-side before the
data gets into elasticsearch using some native / groovy script?

I believe Adrien suggested this procedure:

create a second field specifically to store the value as a
docvalue/not_analyzed string

on the client-side analyze the string yourself

add the new value as a separate field in the document you index

"profit": use that new field for sorting and other queries

A variation of this consists of delegating the generation of the second
field's value to a _source transform.

create the same second field: docvalues-not_analyzed

define a source transform for the affected type of document

in the script of the source transform apply the transformation you need

"profit"
You are saving some bandwidth, the _source of your document will never
show the second value and the impact on your client code is limited to the
queries.
ES will work more and the transform you can do in the script might be
limited.

All I want is following

We have ICU plugin which helps us achieve custom sorting to some
extent.

However, the problem now is that we are trying to use the doc_values
= true option in mapping but this cannot be used for string fields having
analyzer.

So if we need to use ICU plugin then we cannot use doc_value option.
5.Other way is to use the ICU plugin as a library i.e. we call some API
in that plugin which converts our field into required format for sorting.

So is there a way to call some API or transform input using script ?

I suspect it might be difficult to invoke the ICU transformation via a
groovy script.
You could make it work with a native script written in java.

OR If I use your analyzer in a native script, how to invoke the same
from mappings. Please provide usage example

My code snippet is in fact a new mapping type; not an analyzer.
It is more or less a fork of the original string mapping as defined
inside Elasticsearch.

I have packaged this new mapping type in a plugin here:
GitHub - hmalphettes/elasticsearch-docvalues-string: sortable elasticsearch strings indexed as docvalues

It is a work in progress. Help is welcome if it is useful for you.

I hope this helps.
Let us know,
Hugues

Thanks,
Angie

On Friday, 14 November 2014 06:55:36 UTC+5:30, Hugues Malphettes wrote:

Hi Adrian and everyone,

I gave a shot at a extending the 'string' type to add another analyzer:
Extended StringFieldMapper to have a docvalues on an analyzed string · GitHub

The parameter "index_docvalues_analyzer" when present on the mapping
definition will generate a Token Stream and the first token is stored as a
SortedSetDocValuesField.

It works for me. WOuld it be interesting to make this part of the
standard StringFieldMapper?

Cheers!
Hugues

On Tuesday, 7 October 2014 18:11:18 UTC+8, Hugues Malphettes wrote:

Thanks Adrian,

I'll give a shot at the source transform then.

If you consider that it makes sense to support this, would it be
helpful to file an enhancement request on github?
Give us a hint if you think it can be done by an occasional
contributor

Cheers,
Hugues

On Tuesday, 7 October 2014 17:59:29 UTC+8, Adrien Grand wrote:

Hi Hugues,

For now the work-around would indeed be to do the work on client-side
before the data gets into elasticsearch (or potentially using the _source
transform[1] feature).

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Tue, Oct 7, 2014 at 9:19 AM, Hugues Malphettes <hmalp...@gmail.com

wrote:

Hi everyone,

Case insensitive sort is elegantly supported by using a custom
analyzer [1].
doc values are documented as a great fit for sorting [2] to save
heap memory.

However doc values are not support for analyzed strings at the
moment.

Are we planning to support doc values for analyzers that emit a
single token per string?
Is it worth it to have the ES client do the lower-casing and
collation itself?

Thanks!
Hugues

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[2]
Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f722db1a-9898-4b7a-aa05-110b1970aa25%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hugues_Malphettes · February 11, 2015, 3:27am

Hi Angie,

You are trying something different from what we discussed so far.
In all cases, you need to make ES store your analyzed strings as doc values.
One can use a client-side transform, an ES-source-transform or a custom ES
mapping to do that.

At the moment the strings are stored as not-analyzed.
Then you are searching and using a native script to custom sort them.

Also as a note the jar you have attached is a binary only: you probably
want to point at a github repo with the sources and clarify the provenance
and license. At the moment I cant use it.

I hope this helps.
Hugues

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOVZVzSdLaCjJ4rn9zBu%3D-673W7ijwy84%3D95zq4KXRtoOw8RZg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Sorting a string field numerically Elasticsearch	15	16722	July 6, 2017
multi_field and sort Elasticsearch	8	729	December 15, 2011
Sorting Plugin Development Elasticsearch	5	1813	July 6, 2017
Not_analyzed attribute ==> Can't sort on string types with more than one value per doc, or more than one token per field Elasticsearch	9	399	July 6, 2017
Sorting on a custom script field in Java Elasticsearch	17	9127	July 6, 2017

Support for case insensitive sorts with doc values

Related topics