Problem with icu_normalizer and acute accent

Hi!

I'm having an issue with icu_normalizer and acute accents and I don't know if it's something I'm not seeing, a bug, a misconfiguration or what. I'm very new with ES.

I have an analyzer using a char_filter based on icu_normalizer and type nfkc.

Testing the analyzer, the acute accents are merged with the following character. Then I can search the documents with the original text, but not filter them.

I tested it with ES 1.7 in Debian and 5.4.1 in OSX, both with java 1.8.

A document with an attribute using acute accent is created, when I do a search, it is returned:

{
        "query": {
                "match": {
                        "photo.location.exact": {
                                "query": "Li´ege"
                        }
                }
        }
}

But when I try to filter by the original string, it's not returned:

{
    "size": 1000,
    "from": 0,
    "query": {
        "filtered": {
            "query": {
                "bool": {
                    "minimum_should_match": 1,
                    "must": [
                        {
                            "terms": {
                                "photo.location.exact": [
                                    "NY",
                                    "París",
                                    "Li´ege"
                                ]
                            }
                        }
                    ]
                }
            }
        }
    }
}

If I change Li´ege with the text returned by analyzing it, the document is returned.

My index:

{
  "documents": {
    "aliases": {},
    "mappings": {},
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "documents",
        "creation_date": "1497468479974",
        "analysis": {
          "filter": {
            "truncate_field": {
              "length": "1000",
              "type": "truncate"
            }
          },
          "analyzer": {
            "exact_analyzer": {
              "filter": [
                "truncate_field"
              ],
              "char_filter": "nfkc_normalizer",
              "type": "custom",
              "tokenizer": "keyword"
            }
          },
          "char_filter": {
            "nfkc_normalizer": {
              "name": "nfkc",
              "type": "icu_normalizer"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "eeDA54FiQJ2FGq3tSVRfnQ",
        "version": {
          "created": "5040199"
        }
      }
    }
  }
}

I wrote a Python script which create the index, populate the DB with 3 documents and do the searchs, it can be downloaded from here.

What am I doing wrong?

Thanks!!!

One more thing; using nfc instead of nfck, everything works right. Why?

Any clue?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.