Bad charset encoding in field names

I have an issue with Elasticsearch after upgrade to 5.6. I am generating documents using logstash and dynamic fields using native Ruby extension in logstash:

filter {
if [attributes] =~ /[^\s\\]+/ {
kv {
    source => "attributes"
    field_split => ";"
    value_split => ":"
    target => "attributes"
}
ruby {
    code => "
        event.get('attributes').each do |cusField|
            event.set(cusField[0], cusField[1].split(','))
        end
    "
}
}

In these dynamic fields, there are keys using diacritics, but there is a difference in charset or encoding of field content and name/key of the field. Everything should be in UTF-8, however in keys I am still getting bad characters.

Can it be an issue with logstash/ruby or possibly in elasticsearch settings or document mapping?

In following example of the document I used the same values for the name and for the content of fields to make a clear difference:

{
"_index": "articles",
"_type": "article",
"_id": "vmv50726e9dba1a4465805945679f4f",
"_version": 2,
"_score": null,
"_source": {
"artnum": "52129",

"Jednolůžko": [
  "Jednolůžko"
],
"Mix barev/Pestré": [
  "Mix barev/Pestré"
],    
"Ostatní": [
  "Ostatní"
],
"Äeská Republika": [
  "Česká Republika"
],

"dmcprice": 0,
"fieldset": [
  "Ano"
],
"isactive": 0,
"rating": 0,
"parentid": "",
"states": [
  "Novinka"
],
"shortdesc": "...",
"sort": 7959,
"@timestamp": "2017-11-16T11:16:56.003Z",
"istopproduct": 0,
"productsize": [
  "140 x 200 cm"
]
},
"fields": {
  "@timestamp": [
    1510831016003
  ]
},
"stock": 0,
"sort": [
  1510831016003...

I tried to force different charset encoding while creating the dynamic field in ruby script, but the names are still messed up.

Note that this started to happen after upgrade from version 2.3 to 5.6, it was OK before.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.