How to sort Norwegian special characters with the ICU plugin?

I have a problem sorting my searches correctly by using norwegian collation.

I've installed the
https://github.com/elasticsearch/elasticsearch-analysis-icu plugin, and
I've created my index with the following properties.

Java:
.startObject("analysis")
.startObject("analyzer")
.startObject("collation")
.field("tokenizer", "keyword")
.field("filter", "norwegianCollator")
.endObject()
.endObject()
.startObject("filter")
.startObject("norwegianCollator")
.field("type", "icu_collation")
.field("language", "norwegian")
.endObject()
.endObject()
.endObject()

JSON:
{
"index": {
"analysis": {
"analyzer": {
"collation": {
"tokenizer": "keyword",
"filter": "norwegianCollator"
}
},
"filter": {
"norwegianCollator": {
"type": "icu_collation",
"language": "nb"
}
}
}
}
}

Elasticsearch Head show this as:

settings: {

  • index.analysis.analyzer.default.filter: norwegianCollator
  • index.analysis.filter.norwegianCollator.type: icu_collation
  • index.analysis.analyzer.collation.tokenizer: keyword
  • index.analysis.filter.norwegianCollator.language: norwegian

}

Is this correctly configured?

For search I'm doing this:

    QueryBuilder qb = QueryBuilders.matchQuery(
            "customer.partner",
            "1"
    );
    SearchResponse response =
            this.esClient.prepareSearch(this.client.getName())
            .setTypes("customer")
            .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
            .setQuery(qb)
            .setFrom(0).setSize(size)
            .addSort("name", SortOrder.DESC)
            .execute()
            .actionGet();

But the result I get is:

DEBUG 15:21:00 search.ESSearch - {enabled=true, name=ølen fiskelag,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=æ vil ha dæ, partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=åges plateselskap as,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=yara international,
partner=1}

When it should be:

DEBUG 15:21:00 search.ESSearch - {enabled=true, name=åges plateselskap as,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=ølen fiskelag,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=æ vil ha dæ, partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=yara international,
partner=1}

What am I doing wrong? What could be missing in my configuration?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

just quickly looked at your code and I am not sure you are providing a
valid language code. As far as I understand you need to provide code
according to ISO 639-2 Language Code List - Codes for the representation of names of languages (Library of Congress)
(for norwegian
there are couple of options: no, nb, nn. I dunno what is he difference.)
Can you try it with "no" for example?

Also note that it might be more helpful if you can provide examples using
curl (see Elasticsearch Platform — Find real-time answers at scale | Elastic for details).

Regards,
Lukas

On Tue, Jul 23, 2013 at 3:36 PM, Olav Grønås Gjerde olavgg@gmail.comwrote:

I have a problem sorting my searches correctly by using norwegian
collation.

I've installed the
GitHub - elastic/elasticsearch-analysis-icu: ICU Analysis plugin for Elasticsearch plugin, and
I've created my index with the following properties.

Java:
.startObject("analysis")
.startObject("analyzer")
.startObject("collation")
.field("tokenizer", "keyword")
.field("filter", "norwegianCollator")
.endObject()
.endObject()
.startObject("filter")
.startObject("norwegianCollator")
.field("type", "icu_collation")
.field("language", "norwegian")
.endObject()
.endObject()
.endObject()

JSON:
{
"index": {
"analysis": {
"analyzer": {
"collation": {
"tokenizer": "keyword",
"filter": "norwegianCollator"
}
},
"filter": {
"norwegianCollator": {
"type": "icu_collation",
"language": "nb"
}
}
}
}
}

Elasticsearch Head show this as:

settings: {

  • index.analysis.analyzer.default.filter: norwegianCollator
  • index.analysis.filter.norwegianCollator.type: icu_collation
  • index.analysis.analyzer.collation.tokenizer: keyword
  • index.analysis.filter.norwegianCollator.language: norwegian

}

Is this correctly configured?

For search I'm doing this:

    QueryBuilder qb = QueryBuilders.matchQuery(
            "customer.partner",
            "1"
    );
    SearchResponse response =
            this.esClient.prepareSearch(this.client.getName())
            .setTypes("customer")
            .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
            .setQuery(qb)
            .setFrom(0).setSize(size)
            .addSort("name", SortOrder.DESC)
            .execute()
            .actionGet();

But the result I get is:

DEBUG 15:21:00 search.ESSearch - {enabled=true, name=ølen fiskelag,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=æ vil ha dæ,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=åges plateselskap as,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=yara international,
partner=1}

When it should be:

DEBUG 15:21:00 search.ESSearch - {enabled=true, name=åges plateselskap as,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=ølen fiskelag,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=æ vil ha dæ,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=yara international,
partner=1}

What am I doing wrong? What could be missing in my configuration?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

They are valid, 'nb' is Bokmål, and 'nn' is Nynorsk, both are official
norwegian languages.

You should try language code "nb" instead of "norwegian" in the Java
example.

Jörg

On Tue, Jul 23, 2013 at 3:57 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,

just quickly looked at your code and I am not sure you are providing a
valid language code. As far as I understand you need to provide code
according to ISO 639-2 Language Code List - Codes for the representation of names of languages (Library of Congress) (for norwegian
there are couple of options: no, nb, nn. I dunno what is he difference.)
Can you try it with "no" for example?

Also note that it might be more helpful if you can provide examples using
curl (see Elasticsearch Platform — Find real-time answers at scale | Elastic for details).

Regards,
Lukas

On Tue, Jul 23, 2013 at 3:36 PM, Olav Grønås Gjerde olavgg@gmail.comwrote:

I have a problem sorting my searches correctly by using norwegian
collation.

I've installed the
GitHub - elastic/elasticsearch-analysis-icu: ICU Analysis plugin for Elasticsearch plugin, and
I've created my index with the following properties.

Java:
.startObject("analysis")
.startObject("analyzer")
.startObject("collation")
.field("tokenizer", "keyword")
.field("filter", "norwegianCollator")
.endObject()
.endObject()
.startObject("filter")
.startObject("norwegianCollator")
.field("type", "icu_collation")
.field("language", "norwegian")
.endObject()
.endObject()
.endObject()

JSON:
{
"index": {
"analysis": {
"analyzer": {
"collation": {
"tokenizer": "keyword",
"filter": "norwegianCollator"
}
},
"filter": {
"norwegianCollator": {
"type": "icu_collation",
"language": "nb"
}
}
}
}
}

Elasticsearch Head show this as:

settings: {

  • index.analysis.analyzer.default.filter: norwegianCollator
  • index.analysis.filter.norwegianCollator.type: icu_collation
  • index.analysis.analyzer.collation.tokenizer: keyword
  • index.analysis.filter.norwegianCollator.language: norwegian

}

Is this correctly configured?

For search I'm doing this:

    QueryBuilder qb = QueryBuilders.matchQuery(
            "customer.partner",
            "1"
    );
    SearchResponse response =
            this.esClient.prepareSearch(this.client.getName())
            .setTypes("customer")
            .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
            .setQuery(qb)
            .setFrom(0).setSize(size)
            .addSort("name", SortOrder.DESC)
            .execute()
            .actionGet();

But the result I get is:

DEBUG 15:21:00 search.ESSearch - {enabled=true, name=ølen fiskelag,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=æ vil ha dæ,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=åges plateselskap
as, partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=yara international,
partner=1}

When it should be:

DEBUG 15:21:00 search.ESSearch - {enabled=true, name=åges plateselskap
as, partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=ølen fiskelag,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=æ vil ha dæ,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=yara international,
partner=1}

What am I doing wrong? What could be missing in my configuration?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I've tried nb, no and nn, sorting of the æ, ø ,å letters are still in wrong
order.

kl. 18:25:10 UTC+2 tirsdag 23. juli 2013 skrev Jörg Prante følgende:

They are valid, 'nb' is Bokmål, and 'nn' is Nynorsk, both are official
norwegian languages.

You should try language code "nb" instead of "norwegian" in the Java
example.

Jörg

On Tue, Jul 23, 2013 at 3:57 PM, Lukáš Vlček <lukas...@gmail.com<javascript:>

wrote:

Hi,

just quickly looked at your code and I am not sure you are providing a
valid language code. As far as I understand you need to provide code
according to ISO 639-2 Language Code List - Codes for the representation of names of languages (Library of Congress)(for norwegian
there are couple of options: no, nb, nn. I dunno what is he difference.)
Can you try it with "no" for example?

Also note that it might be more helpful if you can provide examples using
curl (see Elasticsearch Platform — Find real-time answers at scale | Elastic for details).

Regards,
Lukas

On Tue, Jul 23, 2013 at 3:36 PM, Olav Grønås Gjerde <ola...@gmail.com<javascript:>

wrote:

I have a problem sorting my searches correctly by using norwegian
collation.

I've installed the
GitHub - elastic/elasticsearch-analysis-icu: ICU Analysis plugin for Elasticsearch plugin, and
I've created my index with the following properties.

Java:
.startObject("analysis")
.startObject("analyzer")
.startObject("collation")
.field("tokenizer", "keyword")
.field("filter", "norwegianCollator")
.endObject()
.endObject()
.startObject("filter")
.startObject("norwegianCollator")
.field("type", "icu_collation")
.field("language", "norwegian")
.endObject()
.endObject()
.endObject()

JSON:
{
"index": {
"analysis": {
"analyzer": {
"collation": {
"tokenizer": "keyword",
"filter": "norwegianCollator"
}
},
"filter": {
"norwegianCollator": {
"type": "icu_collation",
"language": "nb"
}
}
}
}
}

Elasticsearch Head show this as:

settings: {

  • index.analysis.analyzer.default.filter: norwegianCollator
  • index.analysis.filter.norwegianCollator.type: icu_collation
  • index.analysis.analyzer.collation.tokenizer: keyword
  • index.analysis.filter.norwegianCollator.language: norwegian

}

Is this correctly configured?

For search I'm doing this:

    QueryBuilder qb = QueryBuilders.matchQuery(
            "customer.partner",
            "1"
    );
    SearchResponse response =
            this.esClient.prepareSearch(this.client.getName())
            .setTypes("customer")
            .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
            .setQuery(qb)
            .setFrom(0).setSize(size)
            .addSort("name", SortOrder.DESC)
            .execute()
            .actionGet();

But the result I get is:

DEBUG 15:21:00 search.ESSearch - {enabled=true, name=ølen fiskelag,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=æ vil ha dæ,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=åges plateselskap
as, partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=yara international,
partner=1}

When it should be:

DEBUG 15:21:00 search.ESSearch - {enabled=true, name=åges plateselskap
as, partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=ølen fiskelag,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=æ vil ha dæ,
partner=1}
DEBUG 15:21:00 search.ESSearch - {enabled=true, name=yara international,
partner=1}

What am I doing wrong? What could be missing in my configuration?

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It works here, see this curl example Norwegian Bokmål sort with Elasticsearch · GitHub

Is the issue your Java code?

Jörg

On Tue, Jul 23, 2013 at 10:29 PM, Olav Grønås Gjerde olavgg@gmail.comwrote:

I've tried nb, no and nn, sorting of the æ, ø ,å letters are still in
wrong order.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thank you! That full example solved it! I had not added the "analyzer" :
"bokmalAnalyzer" part to my index property mapping. However that wasn't the
only thing, I had to remove the .field("index", "not_analyzed") property
setting :wink:

But now it works exactly as it should! Again, thank you for your help! :slight_smile:

kl. 09:23:49 UTC+2 onsdag 24. juli 2013 skrev Jörg Prante følgende:

It works here, see this curl example
Norwegian Bokmål sort with Elasticsearch · GitHub

Is the issue your Java code?

Jörg

On Tue, Jul 23, 2013 at 10:29 PM, Olav Grønås Gjerde <ola...@gmail.com<javascript:>

wrote:

I've tried nb, no and nn, sorting of the æ, ø ,å letters are still in
wrong order.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.