Hi!
I'm having an issue with icu_normalizer
and acute accents and I don't know if it's something I'm not seeing, a bug, a misconfiguration or what. I'm very new with ES.
I have an analyzer using a char_filter based on icu_normalizer
and type nfkc
.
Testing the analyzer, the acute accents are merged with the following character. Then I can search the documents with the original text, but not filter them.
I tested it with ES 1.7 in Debian and 5.4.1 in OSX, both with java 1.8.
A document with an attribute using acute accent is created, when I do a search, it is returned:
{
"query": {
"match": {
"photo.location.exact": {
"query": "Li´ege"
}
}
}
}
But when I try to filter by the original string, it's not returned:
{
"size": 1000,
"from": 0,
"query": {
"filtered": {
"query": {
"bool": {
"minimum_should_match": 1,
"must": [
{
"terms": {
"photo.location.exact": [
"NY",
"París",
"Li´ege"
]
}
}
]
}
}
}
}
}
If I change Li´ege
with the text returned by analyzing it, the document is returned.
My index:
{
"documents": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "documents",
"creation_date": "1497468479974",
"analysis": {
"filter": {
"truncate_field": {
"length": "1000",
"type": "truncate"
}
},
"analyzer": {
"exact_analyzer": {
"filter": [
"truncate_field"
],
"char_filter": "nfkc_normalizer",
"type": "custom",
"tokenizer": "keyword"
}
},
"char_filter": {
"nfkc_normalizer": {
"name": "nfkc",
"type": "icu_normalizer"
}
}
},
"number_of_replicas": "1",
"uuid": "eeDA54FiQJ2FGq3tSVRfnQ",
"version": {
"created": "5040199"
}
}
}
}
}
I wrote a Python script which create the index, populate the DB with 3 documents and do the searchs, it can be downloaded from here.
What am I doing wrong?
Thanks!!!