Ruby logstash filter

tienld · December 3, 2022, 8:41am

I have a message contains Unicode Escape Sequence

I want convert it to UTF-8 character with my country language (VIetnamese)

Input is from filebeat filestream

I use logstash to parse the message:

\u0043\u1ea3\u006d\u0020\u01a1\u006e\u0020\u0071\u0075\u00fd\u0020\u006b\u0068\u00e1\u0063\u0068

Expect result is:

"Cảm ơn quý khách"

I have write simple ruby script and test and it work:

require 'uri'
message = "\u0043\u1ea3\u006d\u0020\u01a1\u006e\u0020\u0071\u0075\u00fd\u0020\u006b\u0068\u00e1\u0063\u0068"
enc_uri = URI.decode_www_form_component(message)
p enc_uri

But when i push it in to ruby filter in logstash and i puts the result out to testing, it's not work

filter {
    ruby {
        init => "require 'uri'"
        code => "
        @enc_uri = enc_uri = URI.decode_www_form_component(event.get('message'))
        puts @enc_uri
        "
    }
}

Unexpected results:

## This line , expect: `"Cảm ơn quý khách"`
\u0043\u1ea3\u006d\u0020\u01a1\u006e\u0020\u0071\u0075\u00fd\u0020\u006b\u0068\u00e1\u0063\u0068
{
       "message" => "\\u0043\\u1ea3\\u006d\\u0020\\u01a1\\u006e\\u0020\\u0071\\u0075\\u00fd\\u0020\\u006b\\u0068\\u00e1\\u0063\\u0068",
         "event" => {
        "original" => "\\u0043\\u1ea3\\u006d\\u0020\\u01a1\\u006e\\u0020\\u0071\\u0075\\u00fd\\u0020\\u006b\\u0068\\u00e1\\u0063\\u0068"
    },
           "ecs" => {
        "version" => "8.0.0"
    },
         "input" => {
        "type" => "filestream"
    },
         "agent" => {
                "type" => "filebeat",
        "ephemeral_id" => "69ccd3be-66c2-45ab-8ac8-e585698c7a0a",
                "name" => "2285d6af9a56",
             "version" => "8.5.2",
                  "id" => "a009634c-6ee6-487b-8d2b-87cf5c0cd7ec"
    },
          "host" => {
        "name" => "2285d6af9a56"
    },
      "@version" => "1",
           "log" => {
          "file" => {
            "path" => "/var/log/test/api.log"
        },
          "type" => "api",
        "offset" => 63342
    },
    "@timestamp" => 2022-12-03T08:33:53.754Z,
           "biz" => true,
          "tags" => [
        [0] "beats_input_codec_plain_applied"
    ]
}

Please help me explain this, and how to make it work

Badger · December 3, 2022, 6:07pm

If you use a configuration that creates that [message] field with a json codec

    input { generator { count => 1 lines => [ '{ "message": "\u0043\u1ea3\u006d bar1\u0020\u01a1\u006e\u0020\u0071\u0075\u00fd\u0020\u006b\u0068\u00e1\u0063\u0068" }' ] codec => json } }

then you will get

   "message" => "Cảm ơn quý khách",

With a configuration like

input { generator { count => 1 lines => [ '\u0043\u1ea3\u006d\u0020\u01a1\u006e\u0020\u0071\u0075\u00fd\u0020\u006b\u0068\u00e1\u0063\u0068' ] } }

the problem is that all the backslashes get escaped, so that you end up with \\u0043.... That's not URI encoding.

What we can do is walk through the message field looking for \u followed by four hex digits, and then convert the four hex numbers into an integer in network byte order and uudecode it (I think, I copied it from an SO answer)

    ruby {
        code => '
            event.set("someField", event.get("message").gsub(/\\u([\da-fA-F]{4})/) {|x| [$1].pack("H*").unpack("n*").pack("U*")})
        '
    }

results in

 "someField" => "Cảm ơn quý khách"

And yes, I could have overwritten [message] in the event.set call.

Note that since we are doing gsub on parts of the string that are \u plus four hex characters, other unencoded text before, after, or in between those parts is unaffected. So a message field containing

He responded "\u0043\u1ea3...

would result in

He responded "Cảm ơn...

system · December 31, 2022, 6:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Urldecode error Logstash	2	636	February 26, 2019
Charset not considered on HTTP input plugin Logstash	3	1284	July 6, 2017
Is there a way to convert a unicode escape sequence within the Logstash pipeline so that the actual emoji icon is show within Elastic? Logstash	1	225	July 12, 2023
Ruby exception occurred Logstash	2	1359	November 23, 2017
How to urldecode %uXXXX type of strings? Logstash	3	7111	July 6, 2017

Ruby logstash filter

Related topics