Hi,
I have an ES instance with the following mapping & config:
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1,
"analysis": {
"analyzer": {
"custom_analyzer": {
"tokenizer": "standard",
"filter": [
"custom_asciifolding",
"lowercase"
]
}
},
"filter": {
"custom_asciifolding": {
"type": "asciifolding",
"preserve_original": true
}
},
"normalizer": {
"custom_normalizer": {
"type": "custom",
"char_filter": [],
"filter": [
"custom_asciifolding"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"artist_id": {
"type": "integer"
},
"artist_genre": {
"type": "keyword",
"normalizer": "custom_normalizer"
},
"artist_name": {
"type": "text",
"analyzer": "custom_analyzer",
"fields": {
"raw": {
"normalizer": "custom_normalizer",
"type": "keyword"
}
}
},
"artist_type": {
"type": "keyword",
"normalizer": "custom_normalizer"
},
"associated_alias": {
"type": "nested",
"properties": {
"alias_type": {
"type": "keyword",
"normalizer": "custom_normalizer"
},
"artist_id": {
"type": "integer"
},
"artist_name": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
},
"associated_artists": {
"type": "nested",
"properties": {
"_id": {
"type": "keyword"
},
"artist_id": {
"type": "integer"
},
"artist_name": {
"type": "text",
"analyzer": "custom_analyzer"
},
"sequence_number": {
"type": "integer"
}
}
},
"is_active": {
"type": "boolean"
},
"record_provider_name": {
"type": "keyword",
"normalizer": "custom_normalizer"
},
"record_providers": {
"type": "nested",
"properties": {
"name": {
"type": "keyword",
"normalizer": "custom_normalizer"
},
"count": {
"type": "long"
}
}
}
}
}
}
}
While updating an existing document, I get the following error:
elasticsearch.exceptions.RequestError: RequestError(400, 'mapper_parsing_exception', \"failed to parse field [artist_name.raw] of type [keyword] in document with id 'uQRyF3sBPez8u38O8yMm'. Preview of field's value: 'Relajaci\u00f3n'\")"}
While creating a new document, I get a similar error:
"error": {
"type": "mapper_parsing_exception",
"reason": "failed to parse field [artist_name.raw] of type [keyword] in document with id '8a9d479a-b665-4b49-97de-e4efcb7be446'. Preview of field's value: 'Andrea Miller, Alejandro Fernu00e1ndez Lecce'",
"caused_by": {
"type": "illegal_state_exception",
"reason": "The normalization token stream is expected to produce exactly 1 token, but got 2+ for analyzer analyzer name[custom_normalizer], analyzer [org.elasticsearch.index.analysis.CustomAnalyzer@7c67f588], analysisMode [ALL] and input \"Andrea Miller, Alejandro Fernández Lecce\""
}
},
Please note that in both cases, there are some non-ASCII characters like Fernández
and Relajación
.