Invalid UTF-8

cyberzlo · January 12, 2024, 4:14pm

I've been using ruby to decode hex to ascii

if ([field]) {
    mutate {
        gsub => [ "[field]", ":", "" ]
    }

    ruby {
        code => 'event.set("[field]", [event.get("[field]")].pack("H*"))'
    }
}

but I've started to have a problem with the encoding in recent versions of ELK, how do I fix this?

"caused_by"=>{"type"=>"json_parse_exception", "reason"=>"Invalid UTF-8 start byte 0xfe

etc

Without decoding there are no errors - but I need it decoded.

Alex_Salgado-Elastic · January 12, 2024, 4:27pm

Can you show the content of [field] when the error occurs?

cyberzlo · January 12, 2024, 4:35pm

For example:

\x00\x00\x00\x84\xFESMB@\x00\x01\x00\x00\x00\x00\x00\x05\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x16\x00\x00\x00\x00\x00\x00\x00\xFF\xFE\x00\x00\x01\x00\x00\x00\xF0\xB5\x9B\x1F\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x009\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9F\x01\x12\x00\x00\x00\x00\x00\a\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00x\x00\f\x00\x00\x00\x00\x00\x00\x00\x00\x00w\x00k\x00s\x00s\x00v\x00c\x00

CyberChef tool returns me information that this is Valid UTF8

another value:

\u0000\u0000\u0000h\xFESMB@\u0000\u0001\u0000\u0000\u0000\u0000\u0000\u0010\u0000\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u00003\u0000\u0000\u0000\u0000\u0000\u0000\u0000\xFF\xFE\u0000\u0000\u0001\u0000\u0000\u0000_@\xE9>\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000)\u0000\u0001\u0005\u0018\u0000\u0000\u0000h\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\xAB\xB5\xBF~k\u0003\xF6\b\u0014:\xDDJqH\xA4y

Here, for example, there is no relevant text, but sometimes there is, and it is this that I care about so that these fields are decoded and so that events are not skipped because this field is not parsed (this did not happen before - with previous versions of ELK)

Alex_Salgado-Elastic · January 12, 2024, 4:55pm

Hi, running this little ruby code with your data is returning false:

def is_valid_utf8(byte_data)
    byte_data.force_encoding('UTF-8').valid_encoding?
  end

  # Example usage
  byte_string = "\x00\x00\x00\x84\xFESMB@\x00\x01\x00\x00\x00\x00\x00\x05\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x16\x00\x00\x00\x00\x00\x00\x00\xFF\xFE\x00\x00\x01\x00\x00\x00\xF0\xB5\x9B\x1F\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x009\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x9F\x01\x12\x00\x00\x00\x00\x00\a\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00x\x00\f\x00\x00\x00\x00\x00\x00\x00\x00\x00w\x00k\x00s\x00s\x00v\x00c\x00"

  puts is_valid_utf8(byte_string)

cyberzlo · January 12, 2024, 4:56pm

Can I somehow handle it to convert it to text however possible?

cyberzlo · January 12, 2024, 5:05pm

When I don't decode that like above, then it apper as followin in Kibana:

00:00:00:58:fe:53:4d:42:40:00:01:00:00:00:00:00:06:00:01:00:00:00:00:00:00:00:00:00:b9:01:00:00:00:00:00:00:ff:fe:00:00:02:00:00:00:23:22:57:9d:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:18:00:01:00:00:00:00:00:ea:2b:93:09:9a:63:f4:56:d6:c0:18:11:b8:ad:cf:60

Maybe I decode it wrong way?

Alex_Salgado-Elastic · January 12, 2024, 5:06pm

I'm not sure if it's exactly the result you want, but you can try using force_encoding('UTF-8') at the end of the line, like:

ruby {
        code => 'event.set("[field]", [event.get("[field]")].pack("H*").force_encoding('UTF-8'))'
    }

cyberzlo · January 12, 2024, 5:10pm

Thanks but I tried it before and didn't work. I guess my problem is more basic, because entry data looks like that:

00:00:00:44:fe:53:4d:42:40:00:01:00:00:00:00:00:04:00:01:00:00:00:00:00:00:00:00:00:c6:01:00:00:00:00:00:00:ff:fe:00:00:02:00:00:00:23:22:57:9d:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:04:00:00:00

How decode it to any text, as much as possible? Printing "nonprintable" chars is not nessesary.

system · February 9, 2024, 5:11pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Urldecode error Logstash	2	646	February 26, 2019
Logstash 7.14.0 "invalid byte sequence in UTF-8" in logstash.javapipeline Logstash	5	2012	September 10, 2021
Ruby exception occurred Logstash	2	1367	November 23, 2017
Logstash JSON parser error Logstash	2	963	July 6, 2017
Decode ascii hex from string or Convert Charset of a Field on the fly (filters) Logstash	2	2840	July 6, 2017

Invalid UTF-8

Related topics