Logstash json filter plugin values as string

Hi there,

Consider the following logstash json filter snippet:

    json {
        skip_on_invalid_json => true
        source => "message_body"
        target => "jsondoc"
    }

which results on the following logstash event:

    {
      "@timestamp" => 2020-11-27T19:36:46.292Z,
        "jsondoc" => {
          "taskBegin" => true,
          "time" => "2020-09-22T02:39:33.255Z",
          "parent" => {
            "pid" => 8188,
            "taskid" => 34863,
            "name" => "PAS"
          },
          "v" => 0,
          "pid" => 23404,
          "taskid" => 157694,
          "gid" => "En5MFzJZ2JjgLl3HrIfNtA",
          "reqAccepted" => true,
          "req" => {
            "port" => 18681,
            "path" => "/v2/searchTasks/GK2kwu8gSE2fzI7dnUaICg/results?continueToken=&limit=100",
            "method" => "GET"
          },
          "level" => 30,
          "msg" => "",
          "name" => "LoadBalancer"
      }
    }

which contains values of different types, such as booleans, integers, and strings.

Given the highly dynamic nature of this particular field, I would like to ensure it's parsed by the json plugin such that all values are strings, even booleans and numeric values. This is to avoid exceptions upon outputting to elasticsearch. It's not feasible to map these fields either because, again, they will change depending on the input log which can serialise all sorts of json blobs.

In other words, how can I ensure all my values are strings upon processing them with the logstash json plugin?

Desired example output below:

    {
      "@timestamp" => 2020-11-27T19:36:46.292Z,
        "jsondoc" => {
          "taskBegin" => "true",
          "time" => "2020-09-22T02:39:33.255Z",
          "parent" => {
            "pid" => "8188",
            "taskid" => "34863",
            "name" => "PAS"
          },
          "v" => "0",
          "pid" => "23404",
          "taskid" => "157694",
          "gid" => "En5MFzJZ2JjgLl3HrIfNtA",
          "reqAccepted" => "true",
          "req" => {
            "port" => "18681",
            "path" => "/v2/searchTasks/GK2kwu8gSE2fzI7dnUaICg/results?continueToken=&limit=100",
            "method" => "GET"
          },
          "level" => "30",
          "msg" => "",
          "name" => "LoadBalancer"
      }
    }

Thanks in advance

I cannot help feeling there should be a way to do this in the mapping. You might want to ask in the elasticsearch forum if something like this would work.

"dynamic_templates": [
    {
        "everythingstring": { 
            "match" : "jsondoc*",
            "mapping" : { "type" : "text", "norms" : false }
    }
]

But if you really want to do it in logstash it would require a ruby filter

    ruby {
        code => '
            def toString(object, name, event)
                #puts "toString called for #{name}"
                if object
                    if object.kind_of?(Hash) and object != {}
                        object.each { |k, v| toString(v, "#{name}[#{k}]", event) }
                    elsif object.kind_of?(Array) and object != []
                        object.each_index { |i|
                            toString(object[i], "#{name}[#{i}]", event)
                        }
                    else
                        event.set(name, object.to_s)
                    end
                end
            end
            event.to_hash.each { |k, v|
                unless k == "@timestamp"
                    toString(v, "[#{k}]", event)
                end
            }
        '
    }

Thanks Badger, great shout, I'll ask in the elasticsearch forum too. In the meantime I've attempted to implement your ruby snippet in logstash but no luck so far.

Here's my implementation, admittedly without full understanding of your code:

  ruby {
    code => '
      if event.get("jsondoc")
        event = event.get("jsondoc")
        def toString(object, name, event)
          if object
            if object.kind_of?(Hash) and object != {}
              object.each { |k, v| toString(v, "#{name}[#{k}]", event) }
            elsif object.kind_of?(Array) and object != []
              object.each_index { |i|
                toString(object[i], "#{name}[#{i}]", event)
              }
            else
              event.set(name, object.to_s)
            end
          end
        end
        puts event
        event.to_hash.each { |k, v|
          unless k == "@timestamp"
            toString(v, "[#{k}]", event)
          end
        }
      end
    '
  }

Which results in the following logstash ERROR:

[2020-11-28T17:48:17,899][ERROR][logstash.filters.ruby    ][main][fe7a1180cfb715d16df4a9e10dcbfe591680102c86d1e65363a76d0260bed3a1] Ruby exception occurred: undefined method `set' for #<Hash:0x44b7d19>

puts event looks like this:

{"req"=>{"method"=>"GET", "url"=>"http://192.168.194.142:18682/v2/searchTasks/MMxSBiVd-2Pas7jf_0x8hw/results?continueToken=Gyqhj4yjuJJ0ye7IgonI4ZefjOejhrXEva29p4izRUw&limit=100"}, "time"=>"2020-09-22T00:00:34.990Z", "v"=>0, "reqBegin"=>true, "pid"=>23404, "gid"=>"hlH53HBOp5JbxYDAQQoHdA", "level"=>30, "filteredValues"=>["encryptionkey", "encryptioniv"], "msg"=>"", "taskid"=>134068, "name"=>"LoadBalancer",}

That's not going to work. event has to remain a logstash::event, otherwise, as you have found, you will not be able to call .set on it. If you only want to make [jsondoc] strings then go back to my original code but change

unless k == "@timestamp"

to

if k == "jsondoc"

Thanks Badger, that was my misunderstanding. Seems to be working great with your original code:

{
       "jsondoc" => {
                 "hostname" => "AUSAZPRXBTRSZ01",
           "filteredValues" => [
            [0] "encryptionkey",
            [1] "encryptioniv"
        ],
                  "taskEnd" => "true",
                        "v" => "0",
                   "taskid" => "134019",
                  "resSent" => "true",
                     "time" => "2020-09-22T00:00:08.529Z",
                      "msg" => "",
                     "name" => "LoadBalancer",
        "httpStatusMessage" => "OK",
           "httpStatusCode" => "200",
                      "pid" => "23404",
                 "duration" => "10",
                      "gid" => "Sr0qDK6C/3Czfz5Y+gwOVg",
                    "level" => "30"
    },
    "@timestamp" => 2020-11-28T22:01:36.309Z,
          "host" => "56be47cb8cfe",
      "@version" => "1"
}

Many thanks for your help

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.