Invalid UTF-8

I've been using ruby to decode hex to ascii

if ([field]) {
    mutate {
        gsub => [ "[field]", ":", "" ]
    }

    ruby {
        code => 'event.set("[field]", [event.get("[field]")].pack("H*"))'
    }
}

but I've started to have a problem with the encoding in recent versions of ELK, how do I fix this?

"caused_by"=>{"type"=>"json_parse_exception", "reason"=>"Invalid UTF-8 start byte 0xfe

etc

Without decoding there are no errors - but I need it decoded.

Can you show the content of [field] when the error occurs?

For example:

\x00\x00\x00\x84\xFESMB@\x00\x01\x00\x00\x00\x00\x00\x05\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x16\x00\x00\x00\x00\x00\x00\x00\xFF\xFE\x00\x00\x01\x00\x00\x00\xF0\xB5\x9B\x1F\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x009\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9F\x01\x12\x00\x00\x00\x00\x00\a\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00x\x00\f\x00\x00\x00\x00\x00\x00\x00\x00\x00w\x00k\x00s\x00s\x00v\x00c\x00

CyberChef tool returns me information that this is Valid UTF8

another value:

\u0000\u0000\u0000h\xFESMB@\u0000\u0001\u0000\u0000\u0000\u0000\u0000\u0010\u0000\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u00003\u0000\u0000\u0000\u0000\u0000\u0000\u0000\xFF\xFE\u0000\u0000\u0001\u0000\u0000\u0000_@\xE9>\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000)\u0000\u0001\u0005\u0018\u0000\u0000\u0000h\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\xAB\xB5\xBF~k\u0003\xF6\b\u0014:\xDDJqH\xA4y

Here, for example, there is no relevant text, but sometimes there is, and it is this that I care about so that these fields are decoded and so that events are not skipped because this field is not parsed (this did not happen before - with previous versions of ELK)

Hi, running this little ruby code with your data is returning false:

def is_valid_utf8(byte_data)
    byte_data.force_encoding('UTF-8').valid_encoding?
  end

  # Example usage
  byte_string = "\x00\x00\x00\x84\xFESMB@\x00\x01\x00\x00\x00\x00\x00\x05\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x16\x00\x00\x00\x00\x00\x00\x00\xFF\xFE\x00\x00\x01\x00\x00\x00\xF0\xB5\x9B\x1F\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x009\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9F\x01\x12\x00\x00\x00\x00\x00\a\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00x\x00\f\x00\x00\x00\x00\x00\x00\x00\x00\x00w\x00k\x00s\x00s\x00v\x00c\x00"

  puts is_valid_utf8(byte_string)

Can I somehow handle it to convert it to text however possible?

When I don't decode that like above, then it apper as followin in Kibana:

00:00:00:58:fe:53:4d:42:40:00:01:00:00:00:00:00:06:00:01:00:00:00:00:00:00:00:00:00:b9:01:00:00:00:00:00:00:ff:fe:00:00:02:00:00:00:23:22:57:9d:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:18:00:01:00:00:00:00:00:ea:2b:93:09:9a:63:f4:56:d6:c0:18:11:b8:ad:cf:60

Maybe I decode it wrong way?

I'm not sure if it's exactly the result you want, but you can try using force_encoding('UTF-8') at the end of the line, like:

ruby {
        code => 'event.set("[field]", [event.get("[field]")].pack("H*").force_encoding('UTF-8'))'
    }

Thanks but I tried it before and didn't work. I guess my problem is more basic, because entry data looks like that:

00:00:00:44:fe:53:4d:42:40:00:01:00:00:00:00:00:04:00:01:00:00:00:00:00:00:00:00:00:c6:01:00:00:00:00:00:00:ff:fe:00:00:02:00:00:00:23:22:57:9d:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:04:00:00:00

How decode it to any text, as much as possible? Printing "nonprintable" chars is not nessesary.