Invalid UTF-8

I've been using ruby to decode hex to ascii

if ([field]) {
    mutate {
        gsub => [ "[field]", ":", "" ]
    }

    ruby {
        code => 'event.set("[field]", [event.get("[field]")].pack("H*"))'
    }
}

but I've started to have a problem with the encoding in recent versions of ELK, how do I fix this?

"caused_by"=>{"type"=>"json_parse_exception", "reason"=>"Invalid UTF-8 start byte 0xfe

etc

Without decoding there are no errors - but I need it decoded.

Can you show the content of [field] when the error occurs?

For example:

\x00\x00\x00\x84\xFESMB@\x00\x01\x00\x00\x00\x00\x00\x05\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x16\x00\x00\x00\x00\x00\x00\x00\xFF\xFE\x00\x00\x01\x00\x00\x00\xF0\xB5\x9B\x1F\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x009\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9F\x01\x12\x00\x00\x00\x00\x00\a\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00x\x00\f\x00\x00\x00\x00\x00\x00\x00\x00\x00w\x00k\x00s\x00s\x00v\x00c\x00

CyberChef tool returns me information that this is Valid UTF8

another value:

\u0000\u0000\u0000h\xFESMB@\u0000\u0001\u0000\u0000\u0000\u0000\u0000\u0010\u0000\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u00003\u0000\u0000\u0000\u0000\u0000\u0000\u0000\xFF\xFE\u0000\u0000\u0001\u0000\u0000\u0000_@\xE9>\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000)\u0000\u0001\u0005\u0018\u0000\u0000\u0000h\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\xAB\xB5\xBF~k\u0003\xF6\b\u0014:\xDDJqH\xA4y

Here, for example, there is no relevant text, but sometimes there is, and it is this that I care about so that these fields are decoded and so that events are not skipped because this field is not parsed (this did not happen before - with previous versions of ELK)

Hi, running this little ruby code with your data is returning false:

def is_valid_utf8(byte_data)
    byte_data.force_encoding('UTF-8').valid_encoding?
  end

  # Example usage
  byte_string = "\x00\x00\x00\x84\xFESMB@\x00\x01\x00\x00\x00\x00\x00\x05\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x16\x00\x00\x00\x00\x00\x00\x00\xFF\xFE\x00\x00\x01\x00\x00\x00\xF0\xB5\x9B\x1F\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x009\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9F\x01\x12\x00\x00\x00\x00\x00\a\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00x\x00\f\x00\x00\x00\x00\x00\x00\x00\x00\x00w\x00k\x00s\x00s\x00v\x00c\x00"

  puts is_valid_utf8(byte_string)

Can I somehow handle it to convert it to text however possible?

When I don't decode that like above, then it apper as followin in Kibana:

00:00:00:58:fe:53:4d:42:40:00:01:00:00:00:00:00:06:00:01:00:00:00:00:00:00:00:00:00:b9:01:00:00:00:00:00:00:ff:fe:00:00:02:00:00:00:23:22:57:9d:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:18:00:01:00:00:00:00:00:ea:2b:93:09:9a:63:f4:56:d6:c0:18:11:b8:ad:cf:60

Maybe I decode it wrong way?

I'm not sure if it's exactly the result you want, but you can try using force_encoding('UTF-8') at the end of the line, like:

ruby {
        code => 'event.set("[field]", [event.get("[field]")].pack("H*").force_encoding('UTF-8'))'
    }

Thanks but I tried it before and didn't work. I guess my problem is more basic, because entry data looks like that:

00:00:00:44:fe:53:4d:42:40:00:01:00:00:00:00:00:04:00:01:00:00:00:00:00:00:00:00:00:c6:01:00:00:00:00:00:00:ff:fe:00:00:02:00:00:00:23:22:57:9d:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:04:00:00:00

How decode it to any text, as much as possible? Printing "nonprintable" chars is not nessesary.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.