Extract base64 encoded field from JSON message and write the decoded field to a file

Hi All,
I have the following logstash pipeline configuration.

input {
    tcp {
        port => 5102
        codec => json
    }
}
filter {
    json {
        source => "message"
    }
}
output {
    stdout {
        codec => rubydebug
    }
}

The message that I am receiving over TCP socket is below:

{
          "node" => "node0",
      "@version" => "1",
          "data" => "R25BZhcAAAAAAAAAAAAAAAACAAAAgAAAAAAAAAAAAAAHAAMAAgAAABAYAQABAAAAAAAAAAAAAABAAAAAABAAACApAAAAAAAAAAAAAAAAAAAAAQAAAEAAAJCYAAAAAAAAAAAAAAAAAAAAAQAAAEAAAJCYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEYRAABGEQABAAAAAAAAAAAAAAAAAAAAAAAAAAAQAAAAIAAAAEAAAACAAAAAAAEAAAEAAAABAAAAAQAAAIQjCzQmFy0X//////////8AgAAA/////wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEAAAAAAAAARREAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAHqWAAACEwEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABOYUl0EwAAAGx3bWxpYi8xX3N5c2RiX3NoYXJlZF9zYy9kYmcAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEAAAAAAEAAAICkAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/AAAAAAAAAP8PAAAAAwAAAAAAAAAAAACK/7yAiAAAAIQjCzQmFy0XAAAAAAAAAALEcQIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA0P+8gIgAAADEcQIAcBEAABUFAQAFAAAAAAAAAAAAAAABAA4AAAAAALQRAAAAAAAAYwAAAAAAAAALAAAAAAAAAAsAAAAAAAAAAQAAAAAAAAAQEQAMJH8AAMIivYCIAAAAxXECAJIRAAALBQEABwAAAAAAAAAAAAAAoBMADCR/AAABAA4AAAAAAPImAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEBEADCR/AAAVnL2AiAAAAMtxAgBwEQAACQUBAAQAAAAAAAAAAAAAAAEADwAAAAAAZQAAAAAAAAACAAAAAAAAAAAAAAAAAAAAwHt5JyR/AAABAAAAAAAAABARAAwkfwAADp+9gIgAAADMcQIAcBEAABUFAQAFAAAAAAAAAAAAAAABAA8AAAAAAPkQAAAAAAAAZgAAAAAAAAAUAAAAAAAAAAsAAAAAAAAAAAAAAAAAAAAQEQAMJH8AAKjcvYCIAAAAzXECAJERAAALBQEABwAAAAAAAAAAAAAAEGYADCR/AAABAA8AAAAAAE0HAAAAAAAACwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAcBsAGCR/AABu+8mAiAAAAM5xAgBwEQAACQUBAAQAAAAAAAAAAAAAAAEADgAAAAAAYgAAAAAAAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABARAAwkfwAAZ/7JgIgAAADPcQIAcBEAABUFAQAFAAAAAAAAAAAAAAABAA4AAAAAALQRAAAAAAAAYwAAAAAAAAALAAAAAAAAAA4AAAAAAAAAAAAAAAAAAAAQEQAMJH8AAJkvyoCIAAAA0HECAJIRAAAOBQEAAwAAAAAAAAAAAAAABAAAAAAAAAABAAcAAAAAADIYAAAAAAAACwAAAAAAAAAOAAAAAAAAAAEAAAAAAAAAEBEADCR/AACXM8qAiAAAANFxAgCSEQAAGgUBAAUAAAAAAAAAAAAAAAQAAAAAAAAATgAAAAAAAAAAAAAAAAAAACgAAAAAAAAAhBgAAAAAAAAAAAAAAAAAABARAAwkfwAAZkPKgIgAAADScQIAkhEAAAoFAQAGAAAAAAAAAAAAAACgEwAMJH8AAAEADgAAAAAA8yYAAAAAAAAAAAAAAAAAAMB7eSckfwAAAQAAAAAAAAAQEQAMJH8AACniy4CIAAAA03ECAHARAAAJBQEABAAAAAAAAAAAAAAAAQAOAAAAAABiAAAAAAAAAAIAAAAAAAAAAAAAAAAAAADAe3knJH8AAAEAAAAAAAAAEBEADCR/AACD5MuAiAAAANRxAgBwEQAAFQUBAAUAAAAAAAAAAAAAAAEADgAAAAAAtBEAAAAAAABjAAAAAAAAAAsAAAAAAAAACwAAAAAAAAABAAAAAAAAABARAAwkfwAARAXMgIgAAADVcQIAkhEAAAsFAQAHAAAAAAAAAAAAAACgEwAMJH8AAAEADgAAAAAA8yYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQEQAMJH8AAA1SzICIAAAA2nECAJIRAAA=",
        "buffer" => "test/dbg",
    "@timestamp" => 2022-12-02T21:38:10.842695522Z
}

The data field in the above JSON message is Base64 encoded and I want to decode this and store it in a file using the "node" and "buffer" field values. Considering the above example, I want to store the bas64 decoded data in a file named "node0/test/dbg".
Can someone tell how to achieve this?

Thanks,
Arinjay

You could start with something like

    json { source => "message" remove_field => [ "message" ] }
    ruby {
        code => '
            b = event.get("buffer")
            n = event.get("node")
            if b and n
                event.set("#{n}/#{b}", Base64.decode64(event.get("data")))
            end
        '
     }

but that results in

"node0/test/dbg" => "GnAf\x17\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x80\x00\x00\x00\x00\x00\x00\x00\x00

plus several hundred more nulls and random hex numbers that I do not recognize in any encoding I use. There is a character encoding problem here, and not one I can help with.

Thanks @Badger for sharing the pipeline config. I tried it, but see that my logstash docker instance starts to go down due to below error:

java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid1.hprof ...
Heap dump file created [1180165506 bytes in 4.223 secs]
[2022-12-05T21:49:09,929][ERROR][logstash.inputs.tcp      ][main][59bfea76ec4a31ddc4e6aa2f134b14255e35b46edd0486aae8fb9006d0a28073] xxx-xxxx/172.26.228.88:47073: closing due:
java.lang.OutOfMemoryError: Java heap space
[2022-12-05T21:49:08,526][FATAL][org.logstash.Logstash    ] uncaught error (in thread Ruby-0-Thread-1: /usr/share/logstash/logstash-core/lib/logstash/runner.rb:410)
java.lang.OutOfMemoryError: Java heap space

The latest pipeline config that I have is below:

input {
    tcp {
        port => 5102
        codec => json
    }
}
filter {
    json {
        source => "message" remove_field => [ "message" ]
    }
    ruby {
        code => '
            b = event.get("buffer")
            n = event.get("node")
            if b and n
                event.set("#{n}/#{b}", Base64.decode64(event.get("data")))
            end
        '
    }
}
output {
    stdout {
        codec => rubydebug
    }
}

Thanks,
Arinjay

See this thread for how to diagnose the memory leak.

I would guess the problem is the json codec on the input -- it is just reading more and more data and never finding the end of the first JSON object.

I found some issue on the sender of JSON data and now I dont see the memory leak issue. But with the above pipeline configuration, I still dont see the file getting created on the server where logstash pipeline is running. I tried making the following change to the configuration and that resulted in error. Below is the complete configuration I tried:

input {
    tcp {
        port => 5102
        codec => json
    }
}
filter {
    json {
        source => "message" remove_field => [ "message" ]
    }
    ruby {
        code => '
            b = event.get("buffer")
            n = event.get("node")
            if b and n
                event.set("dec_data", Base64.decode64(event.get("data")))
                event.set("file_name", "#{n}/#{b}")
                **File.open(event.get("file_name"), "a") { |f| f.puts event.get("dec_data") }**
            end
        '
    }
}
output {
    stdout {
        codec => rubydebug
    }
}

Can you help me with the config to write the decoded data to a file.

Thanks,
Arinjay

Please ignore the "**" in the config. I tried to highlight this line by making it appear Bold, but looks like the editor put a "**" in the text.

Thanks,
Arinjay

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.