Support for case insensitive sorts with doc values

Hi everyone,

Case insensitive sort is elegantly supported by using a custom analyzer [1].
doc values are documented as a great fit for sorting [2] to save heap
memory.

However doc values are not support for analyzed strings at the moment.

Are we planning to support doc values for analyzers that emit a single
token per string?
Is it worth it to have the ES client do the lower-casing and collation
itself?

Thanks!
Hugues

[1] http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/sorting-collations.html#case-insensitive-sorting
[2] http://www.elasticsearch.org/blog/elasticsearch-1-4-0-beta-released/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Hugues,

For now the work-around would indeed be to do the work on client-side
before the data gets into elasticsearch (or potentially using the _source
transform[1] feature).

[1]

On Tue, Oct 7, 2014 at 9:19 AM, Hugues Malphettes hmalphettes@gmail.com
wrote:

Hi everyone,

Case insensitive sort is elegantly supported by using a custom analyzer
[1].
doc values are documented as a great fit for sorting [2] to save heap
memory.

However doc values are not support for analyzed strings at the moment.

Are we planning to support doc values for analyzers that emit a single
token per string?
Is it worth it to have the ES client do the lower-casing and collation
itself?

Thanks!
Hugues

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[2] Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7-6D%2BC-mZyxDsHSey%2BQwbwJ0bLW6OH2BCCRqQsXv1VXw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Adrian,

I'll give a shot at the source transform then.

If you consider that it makes sense to support this, would it be helpful to
file an enhancement request on github?
Give us a hint if you think it can be done by an occasional contributor :wink:

Cheers,
Hugues

On Tuesday, 7 October 2014 17:59:29 UTC+8, Adrien Grand wrote:

Hi Hugues,

For now the work-around would indeed be to do the work on client-side
before the data gets into elasticsearch (or potentially using the _source
transform[1] feature).

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Tue, Oct 7, 2014 at 9:19 AM, Hugues Malphettes <hmalp...@gmail.com
<javascript:>> wrote:

Hi everyone,

Case insensitive sort is elegantly supported by using a custom analyzer
[1].
doc values are documented as a great fit for sorting [2] to save heap
memory.

However doc values are not support for analyzed strings at the moment.

Are we planning to support doc values for analyzers that emit a single
token per string?
Is it worth it to have the ES client do the lower-casing and collation
itself?

Thanks!
Hugues

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[2] Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/94d2561a-13de-4161-a31a-ca206a91e7b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Adrian and everyone,

I gave a shot at a extending the 'string' type to add another analyzer:

The parameter "index_docvalues_analyzer" when present on the mapping
definition will generate a Token Stream and the first token is stored as a
SortedSetDocValuesField.

It works for me. WOuld it be interesting to make this part of the standard
StringFieldMapper?

Cheers!
Hugues

On Tuesday, 7 October 2014 18:11:18 UTC+8, Hugues Malphettes wrote:

Thanks Adrian,

I'll give a shot at the source transform then.

If you consider that it makes sense to support this, would it be helpful
to file an enhancement request on github?
Give us a hint if you think it can be done by an occasional contributor :wink:

Cheers,
Hugues

On Tuesday, 7 October 2014 17:59:29 UTC+8, Adrien Grand wrote:

Hi Hugues,

For now the work-around would indeed be to do the work on client-side
before the data gets into elasticsearch (or potentially using the _source
transform[1] feature).

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Tue, Oct 7, 2014 at 9:19 AM, Hugues Malphettes hmalp...@gmail.com
wrote:

Hi everyone,

Case insensitive sort is elegantly supported by using a custom analyzer
[1].
doc values are documented as a great fit for sorting [2] to save heap
memory.

However doc values are not support for analyzed strings at the moment.

Are we planning to support doc values for analyzers that emit a single
token per string?
Is it worth it to have the ES client do the lower-casing and collation
itself?

Thanks!
Hugues

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[2] Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cb25e585-32b0-48fd-b5f6-4bede21f5864%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Hugues,

So you have extended "String" type to add another analyzer.

Is there any way to use script/transform the source and then apply sort on it? If yes will you please share the same.

As mentioned by Adrien, is there work-around on client-side before the data gets into elasticsearch using some native / groovy script?

All I want is following

  1. One option is to lower case the field values for sorting but it may not work for all locales.
  2. Then we have ICU plugin which helps us achieve this to some extent.
  3. However, the problem now is that we are trying to use the doc_values = true option in mapping but this cannot be used for string fields having analyzer.
  4. So if we need to use ICU plugin then we cannot use doc_value option.
    5.Other way is to use the ICU plugin as a library i.e. we call some API in that plugin which converts our field into required format for sorting.

So is there a way to call some API or transform input using script ?

Regards,
Angie

Hi Hugues,

So you have extended "String" type to add custom analyzer.

I am referring to this thread
http://elasticsearch-users.115913.n3.nabble.com/Support-for-case-insensitive-sorts-with-doc-values-tt4064487.html

Is there any way to use script/transform the source and then apply sort on
it? If yes will you please share the same.

As mentioned by Adrien, is there work-around on client-side before the data
gets into elasticsearch using some native / groovy script?

All I want is following

  1. We have ICU plugin which helps us achieve custom sorting to some extent.
  2. However, the problem now is that we are trying to use the doc_values =
    true option in mapping but this cannot be used for string fields having
    analyzer.
  3. So if we need to use ICU plugin then we cannot use doc_value option.
    5.Other way is to use the ICU plugin as a library i.e. we call some API in
    that plugin which converts our field into required format for sorting.

So is there a way to call some API or transform input using script ?

OR If I use your analyzer in a native script, how to invoke the same from
mappings. Please provide usage example

Thanks,
Angie

On Friday, 14 November 2014 06:55:36 UTC+5:30, Hugues Malphettes wrote:

Hi Adrian and everyone,

I gave a shot at a extending the 'string' type to add another analyzer:
Extended StringFieldMapper to have a docvalues on an analyzed string · GitHub

The parameter "index_docvalues_analyzer" when present on the mapping
definition will generate a Token Stream and the first token is stored as a
SortedSetDocValuesField.

It works for me. WOuld it be interesting to make this part of the standard
StringFieldMapper?

Cheers!
Hugues

On Tuesday, 7 October 2014 18:11:18 UTC+8, Hugues Malphettes wrote:

Thanks Adrian,

I'll give a shot at the source transform then.

If you consider that it makes sense to support this, would it be helpful
to file an enhancement request on github?
Give us a hint if you think it can be done by an occasional contributor
:wink:

Cheers,
Hugues

On Tuesday, 7 October 2014 17:59:29 UTC+8, Adrien Grand wrote:

Hi Hugues,

For now the work-around would indeed be to do the work on client-side
before the data gets into elasticsearch (or potentially using the _source
transform[1] feature).

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Tue, Oct 7, 2014 at 9:19 AM, Hugues Malphettes hmalp...@gmail.com
wrote:

Hi everyone,

Case insensitive sort is elegantly supported by using a custom analyzer
[1].
doc values are documented as a great fit for sorting [2] to save heap
memory.

However doc values are not support for analyzed strings at the moment.

Are we planning to support doc values for analyzers that emit a single
token per string?
Is it worth it to have the ES client do the lower-casing and collation
itself?

Thanks!
Hugues

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[2]
Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4b610512-5af2-4493-ae72-2afac2871f5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Angie,

On Friday, 6 February 2015 12:17:47 UTC+8, Geetanjali Paygude wrote:

Hi Hugues,

So you have extended "String" type to add custom analyzer.

I am referring to this thread

http://elasticsearch-users.115913.n3.nabble.com/Support-for-case-insensitive-sorts-with-doc-values-tt4064487.html

Is there any way to use script/transform the source and then apply sort on
it? If yes will you please share the same.

As mentioned by Adrien, is there work-around on client-side before the
data gets into elasticsearch using some native / groovy script?

I believe Adrien suggested this procedure:

  • create a second field specifically to store the value as a
    docvalue/not_analyzed string
  • on the client-side analyze the string yourself
  • add the new value as a separate field in the document you index
  • "profit": use that new field for sorting and other queries

A variation of this consists of delegating the generation of the second
field's value to a _source transform.

  • create the same second field: docvalues-not_analyzed
  • define a source transform for the affected type of document
  • in the script of the source transform apply the transformation you need
  • "profit"
    You are saving some bandwidth, the _source of your document will never show
    the second value and the impact on your client code is limited to the
    queries.
    ES will work more and the transform you can do in the script might be
    limited.

All I want is following

  1. We have ICU plugin which helps us achieve custom sorting to some
    extent.
  2. However, the problem now is that we are trying to use the doc_values =
    true option in mapping but this cannot be used for string fields having
    analyzer.
  3. So if we need to use ICU plugin then we cannot use doc_value option.
    5.Other way is to use the ICU plugin as a library i.e. we call some API in
    that plugin which converts our field into required format for sorting.

So is there a way to call some API or transform input using script ?

I suspect it might be difficult to invoke the ICU transformation via a
groovy script.
You could make it work with a native script written in java.

OR If I use your analyzer in a native script, how to invoke the same from
mappings. Please provide usage example

My code snippet is in fact a new mapping type; not an analyzer.
It is more or less a fork of the original string mapping as defined inside
Elasticsearch.

I have packaged this new mapping type in a plugin here:

It is a work in progress. Help is welcome if it is useful for you.

I hope this helps.
Let us know,
Hugues

Thanks,
Angie

On Friday, 14 November 2014 06:55:36 UTC+5:30, Hugues Malphettes wrote:

Hi Adrian and everyone,

I gave a shot at a extending the 'string' type to add another analyzer:
Extended StringFieldMapper to have a docvalues on an analyzed string · GitHub

The parameter "index_docvalues_analyzer" when present on the mapping
definition will generate a Token Stream and the first token is stored as a
SortedSetDocValuesField.

It works for me. WOuld it be interesting to make this part of the
standard StringFieldMapper?

Cheers!
Hugues

On Tuesday, 7 October 2014 18:11:18 UTC+8, Hugues Malphettes wrote:

Thanks Adrian,

I'll give a shot at the source transform then.

If you consider that it makes sense to support this, would it be helpful
to file an enhancement request on github?
Give us a hint if you think it can be done by an occasional contributor
:wink:

Cheers,
Hugues

On Tuesday, 7 October 2014 17:59:29 UTC+8, Adrien Grand wrote:

Hi Hugues,

For now the work-around would indeed be to do the work on client-side
before the data gets into elasticsearch (or potentially using the _source
transform[1] feature).

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Tue, Oct 7, 2014 at 9:19 AM, Hugues Malphettes hmalp...@gmail.com
wrote:

Hi everyone,

Case insensitive sort is elegantly supported by using a custom
analyzer [1].
doc values are documented as a great fit for sorting [2] to save
heap memory.

However doc values are not support for analyzed strings at the moment.

Are we planning to support doc values for analyzers that emit a single
token per string?
Is it worth it to have the ES client do the lower-casing and collation
itself?

Thanks!
Hugues

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[2]
Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a9d415bd-b8c6-4d5c-80e6-70b7676eb6b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Hugues,

This is helpful !! However I was trying to write Java native script for
sorting values using ICUCollation.
Please find attached JAR . This script works fine. However it gives
incorrect sorting result.

Please find below code snippet

Script is as follows:

PUT /custom1_index

{

"my_type": 

{"properties" :

  { "LastName" :

    {

    "type": "string", 

    "index":"not-anlayzed"

   }

  }  

}

}

PUT /custom1_index/my_type/1

{ "LastName": "AAP"

}

PUT /custom1_index/my_type/2

{ "LastName": "zara"

}

PUT /custom1_index/my_type/3

{ "LastName": "beta"

}

GET /custom1_index/_search

{

"script_fields": {

    "sort": {

        "script": "ICUSortingScriptFilter",

        "lang": "native",

        "params": {

            "field": “LastName"

            

        }

    },

    "type": "string"

 

}

}
Please let me know if any correction is required in this script.

Regards,
Angie

On Friday, 6 February 2015 11:40:53 UTC+5:30, Hugues Malphettes wrote:

Hi Angie,

On Friday, 6 February 2015 12:17:47 UTC+8, Geetanjali Paygude wrote:

Hi Hugues,

So you have extended "String" type to add custom analyzer.

I am referring to this thread

http://elasticsearch-users.115913.n3.nabble.com/Support-for-case-insensitive-sorts-with-doc-values-tt4064487.html

Is there any way to use script/transform the source and then apply sort
on it? If yes will you please share the same.

As mentioned by Adrien, is there work-around on client-side before the
data gets into elasticsearch using some native / groovy script?

I believe Adrien suggested this procedure:

  • create a second field specifically to store the value as a
    docvalue/not_analyzed string
  • on the client-side analyze the string yourself
  • add the new value as a separate field in the document you index
  • "profit": use that new field for sorting and other queries

A variation of this consists of delegating the generation of the second
field's value to a _source transform.

  • create the same second field: docvalues-not_analyzed
  • define a source transform for the affected type of document
  • in the script of the source transform apply the transformation you need
  • "profit"
    You are saving some bandwidth, the _source of your document will never
    show the second value and the impact on your client code is limited to the
    queries.
    ES will work more and the transform you can do in the script might be
    limited.

All I want is following

  1. We have ICU plugin which helps us achieve custom sorting to some
    extent.
  2. However, the problem now is that we are trying to use the doc_values =
    true option in mapping but this cannot be used for string fields having
    analyzer.
  3. So if we need to use ICU plugin then we cannot use doc_value option.
    5.Other way is to use the ICU plugin as a library i.e. we call some API
    in that plugin which converts our field into required format for sorting.

So is there a way to call some API or transform input using script ?

I suspect it might be difficult to invoke the ICU transformation via a
groovy script.
You could make it work with a native script written in java.

OR If I use your analyzer in a native script, how to invoke the same from
mappings. Please provide usage example

My code snippet is in fact a new mapping type; not an analyzer.
It is more or less a fork of the original string mapping as defined inside
Elasticsearch.

I have packaged this new mapping type in a plugin here:
GitHub - hmalphettes/elasticsearch-docvalues-string: sortable elasticsearch strings indexed as docvalues

It is a work in progress. Help is welcome if it is useful for you.

I hope this helps.
Let us know,
Hugues

Thanks,
Angie

On Friday, 14 November 2014 06:55:36 UTC+5:30, Hugues Malphettes wrote:

Hi Adrian and everyone,

I gave a shot at a extending the 'string' type to add another analyzer:
Extended StringFieldMapper to have a docvalues on an analyzed string · GitHub

The parameter "index_docvalues_analyzer" when present on the mapping
definition will generate a Token Stream and the first token is stored as a
SortedSetDocValuesField.

It works for me. WOuld it be interesting to make this part of the
standard StringFieldMapper?

Cheers!
Hugues

On Tuesday, 7 October 2014 18:11:18 UTC+8, Hugues Malphettes wrote:

Thanks Adrian,

I'll give a shot at the source transform then.

If you consider that it makes sense to support this, would it be
helpful to file an enhancement request on github?
Give us a hint if you think it can be done by an occasional contributor
:wink:

Cheers,
Hugues

On Tuesday, 7 October 2014 17:59:29 UTC+8, Adrien Grand wrote:

Hi Hugues,

For now the work-around would indeed be to do the work on client-side
before the data gets into elasticsearch (or potentially using the _source
transform[1] feature).

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Tue, Oct 7, 2014 at 9:19 AM, Hugues Malphettes hmalp...@gmail.com
wrote:

Hi everyone,

Case insensitive sort is elegantly supported by using a custom
analyzer [1].
doc values are documented as a great fit for sorting [2] to save
heap memory.

However doc values are not support for analyzed strings at the moment.

Are we planning to support doc values for analyzers that emit a
single token per string?
Is it worth it to have the ES client do the lower-casing and
collation itself?

Thanks!
Hugues

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[2]
Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d5fba22-1f19-48b0-bce7-062cad407c01%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

And it fails for following script

GET /custom10_index/_search
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"script": "ICUSortingScriptFilter",
"params": {
"field": "LastName"
},
"lang": "native",
"type": "string"
}
}
}

Result:
{
"took": 92,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 4,
"failed": 1,
"failures": [
{
"index": "custom10_index",
"shard": 2,
"status": 500,
"reason":
"RemoteTransportException[[Sise-Neg][inet[/10.211.242.237:9301]][indices:data/read/search[phase/query]]];
nested: QueryPhaseExecutionException[[custom10_index][2]:
query[ConstantScore(:)],from[0],size[10],sort[<custom:"_script":
org.elasticsearch.search.sort.ScriptSortParser$1@89da6f0>]: Query Failed
[Failed to execute main query]]; nested: UnsupportedOperationException; "
}
]
},
"hits": {
"total": 0,
"max_score": null,
"hits":
}
}

Please let me know if there is a way to sort using custom native java
script in elasticsearch?

On Tuesday, 10 February 2015 13:48:23 UTC+5:30, Geetanjali Paygude wrote:

Thanks Hugues,

This is helpful !! However I was trying to write Java native script for
sorting values using ICUCollation.
Please find attached JAR . This script works fine. However it gives
incorrect sorting result.

Please find below code snippet

Script is as follows:

PUT /custom1_index

{

"my_type": 

{"properties" :

  { "LastName" :

    {

    "type": "string", 

    "index":"not-anlayzed"

   }

  }  

}

}

PUT /custom1_index/my_type/1

{ "LastName": "AAP"

}

PUT /custom1_index/my_type/2

{ "LastName": "zara"

}

PUT /custom1_index/my_type/3

{ "LastName": "beta"

}

GET /custom1_index/_search

{

"script_fields": {

    "sort": {

        "script": "ICUSortingScriptFilter",

        "lang": "native",

        "params": {

            "field": “LastName"

            

        }

    },

    "type": "string"

 

}

}
Please let me know if any correction is required in this script.

Regards,
Angie

On Friday, 6 February 2015 11:40:53 UTC+5:30, Hugues Malphettes wrote:

Hi Angie,

On Friday, 6 February 2015 12:17:47 UTC+8, Geetanjali Paygude wrote:

Hi Hugues,

So you have extended "String" type to add custom analyzer.

I am referring to this thread

http://elasticsearch-users.115913.n3.nabble.com/Support-for-case-insensitive-sorts-with-doc-values-tt4064487.html

Is there any way to use script/transform the source and then apply sort
on it? If yes will you please share the same.

As mentioned by Adrien, is there work-around on client-side before the
data gets into elasticsearch using some native / groovy script?

I believe Adrien suggested this procedure:

  • create a second field specifically to store the value as a
    docvalue/not_analyzed string
  • on the client-side analyze the string yourself
  • add the new value as a separate field in the document you index
  • "profit": use that new field for sorting and other queries

A variation of this consists of delegating the generation of the second
field's value to a _source transform.

  • create the same second field: docvalues-not_analyzed
  • define a source transform for the affected type of document
  • in the script of the source transform apply the transformation you need
  • "profit"
    You are saving some bandwidth, the _source of your document will never
    show the second value and the impact on your client code is limited to the
    queries.
    ES will work more and the transform you can do in the script might be
    limited.

All I want is following

  1. We have ICU plugin which helps us achieve custom sorting to some
    extent.
  2. However, the problem now is that we are trying to use the doc_values
    = true option in mapping but this cannot be used for string fields having
    analyzer.
  3. So if we need to use ICU plugin then we cannot use doc_value option.
    5.Other way is to use the ICU plugin as a library i.e. we call some API
    in that plugin which converts our field into required format for sorting.

So is there a way to call some API or transform input using script ?

I suspect it might be difficult to invoke the ICU transformation via a
groovy script.
You could make it work with a native script written in java.

OR If I use your analyzer in a native script, how to invoke the same
from mappings. Please provide usage example

My code snippet is in fact a new mapping type; not an analyzer.
It is more or less a fork of the original string mapping as defined
inside Elasticsearch.

I have packaged this new mapping type in a plugin here:
GitHub - hmalphettes/elasticsearch-docvalues-string: sortable elasticsearch strings indexed as docvalues

It is a work in progress. Help is welcome if it is useful for you.

I hope this helps.
Let us know,
Hugues

Thanks,
Angie

On Friday, 14 November 2014 06:55:36 UTC+5:30, Hugues Malphettes wrote:

Hi Adrian and everyone,

I gave a shot at a extending the 'string' type to add another analyzer:
Extended StringFieldMapper to have a docvalues on an analyzed string · GitHub

The parameter "index_docvalues_analyzer" when present on the mapping
definition will generate a Token Stream and the first token is stored as a
SortedSetDocValuesField.

It works for me. WOuld it be interesting to make this part of the
standard StringFieldMapper?

Cheers!
Hugues

On Tuesday, 7 October 2014 18:11:18 UTC+8, Hugues Malphettes wrote:

Thanks Adrian,

I'll give a shot at the source transform then.

If you consider that it makes sense to support this, would it be
helpful to file an enhancement request on github?
Give us a hint if you think it can be done by an occasional
contributor :wink:

Cheers,
Hugues

On Tuesday, 7 October 2014 17:59:29 UTC+8, Adrien Grand wrote:

Hi Hugues,

For now the work-around would indeed be to do the work on client-side
before the data gets into elasticsearch (or potentially using the _source
transform[1] feature).

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Tue, Oct 7, 2014 at 9:19 AM, Hugues Malphettes <hmalp...@gmail.com

wrote:

Hi everyone,

Case insensitive sort is elegantly supported by using a custom
analyzer [1].
doc values are documented as a great fit for sorting [2] to save
heap memory.

However doc values are not support for analyzed strings at the
moment.

Are we planning to support doc values for analyzers that emit a
single token per string?
Is it worth it to have the ES client do the lower-casing and
collation itself?

Thanks!
Hugues

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic
[2]
Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f722db1a-9898-4b7a-aa05-110b1970aa25%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Angie,

You are trying something different from what we discussed so far.
In all cases, you need to make ES store your analyzed strings as doc values.
One can use a client-side transform, an ES-source-transform or a custom ES
mapping to do that.

At the moment the strings are stored as not-analyzed.
Then you are searching and using a native script to custom sort them.

Also as a note the jar you have attached is a binary only: you probably
want to point at a github repo with the sources and clarify the provenance
and license. At the moment I cant use it.

I hope this helps.
Hugues

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOVZVzSdLaCjJ4rn9zBu%3D-673W7ijwy84%3D95zq4KXRtoOw8RZg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.