Can anyone help on nested json parsing with Logstash?

I am currently looking to parse some json records on logstash to then push to opensearch/kibana for analysis. Specifically I hope to pull the "rtt" and associated "instance" value metric from each message body so I can report on latency. Being a complete newbie to json parsing and logstash however I could do with some pointers from some experts.

Can anyone help me on how to build a json parser to pull the "rtt" metric json object along with its associated dimensions...."instance" && "session". Below is a sample JSON record that I am working with.

Any help/pointers/advice is greatly appreciated as Im a newbie running out of ideas.

[{"MetricName":"read_bytes_rate","Timestamp":"2021-10-25T14:06:23Z","Value":8.5159199999999999e-109,"Dimensions":[{"Name":"instance","Value":"i-123456"},{"Name":"session","Value":"1234234-1234-4ac2134c-12323-1234234"},{"Name":"connection","Value":"1"},{"Name":"channel","Value":"usb"}]},{"MetricName":"written_bytes","Timestamp":"2021-10-25T14:07:23Z","Value":56.0,"Dimensions":[{"Name":"instance","Value":"i-123456"},{"Name":"session","Value":"1234234-1234-4ac2134c-12323-1234234"},{"Name":"connection","Value":"1"},{"Name":"channel","Value":"usb"}]},{"MetricName":"connection_count","Timestamp":"2021-10-25T12:29:45Z","Value":1.0,"Dimensions":[{"Name":"instance","Value":"i-123456"},{"Name":"session","Value":"1234234-1234-4ac2134c-12323-1234234"}]},{"MetricName":"rtt","Timestamp":"2021-10-25T14:07:23Z","Unit":"Milliseconds","StatisticValues":{"SampleCount":23,"Sum":129.398,"Minimum":3.5150000000000001,"Maximum":16.617999999999999},"Dimensions":[{"Name":"instance","Value":"i-123456"},{"Name":"session","Value":"1234234-1234-4ac2134c-12323-1234234"},{"Name":"connection","Value":"1"},{"Name":"channel","Value":"main"}]},{"MetricName":"rtt_p50","Timestamp":"2021-10-25T14:07:23Z","Unit":"Milliseconds","Value":4.7679999999999998,"Dimensions":[{"Name":"instance","Value":"i-123456"},{"Name":"session","Value":"1234234-1234-4ac2134c-12323-1234234"},{"Name":"connection","Value":"1"},{"Name":"channel","Value":"main"}]},{"MetricName":"rtt_p90","Timestamp":"2021-10-25T14:07:23Z","Unit":"Milliseconds","Value":8.0126000000000008,"Dimensions":[{"Name":"instance","Value":"i-123456"},{"Name":"session","Value":"1234234-1234-4ac2134c-12323-1234234"},{"Name":"connection","Value":"1"},{"Name":"channel","Value":"main"}]},{"MetricName":"rtt_p99","Timestamp":"2021-10-25T14:07:23Z","Unit":"Milliseconds","Value":15.233100000000007,"Dimensions":[{"Name":"instance","Value":"i-123456"},{"Name":"session","Value":"1234234-1234-4ac2134c-12323-1234234"},{"Name":"connection","Value":"1"},{"Name":"channel","Value":"main"}]},{"MetricName":"session_count","Timestamp":"2021-10-25T12:29:44Z","Value":1.0,"Dimensions":[{"Name":"instance","Value":"i-123456"}]},{"MetricName":"written_bytes_rate","Timestamp":"2021-10-25T14:06:23Z","Value":8.5159199999999999e-109,"Dimensions":[{"Name":"instance","Value":"i-123456"},{"Name":"session","Value":"1234234-1234-4ac2134c-12323-1234234"},{"Name":"connection","Value":"1"},{"Name":"channel","Value":"PhotonMessageChannel"}]},{"MetricName":"written_bytes","Timestamp":"2021-10-25T14:07:23Z","Value":24.0,"Dimensions":[{"Name":"instance","Value":"i-123456"},{"Name":"session","Value":"1234234-1234-4ac2134c-12323-1234234"},{"Name":"connection","Value":"1"},{"Name":"channel","Value":"filestorage"}]},{"MetricName":"written_bytes","Timestamp":"2021-10-25T14:07:23Z","Value":1389840.0,"Dimensions":[{"Name":"instance","Value":"i-123456"},{"Name":"session","Value":"1234234-1234-4ac2134c-12323-1234234"},{"Name":"connection","Value":"1"},{"Name":"channel","Value":"input"}]},{"MetricName":"written_bytes","Timestamp":"2021-10-25T14:07:23Z","Value":660256.0,"Dimensions":[{"Name":"instance","Value":"i-123456"},{"Name":"session","Value":"1234234-1234-4ac2134c-12323-1234234"},{"Name":"connection","Value":"1"},{"Name":"channel","Value":"redirection"}]},{"MetricName":"written_bytes","Timestamp":"2021-10-25T14:07:23Z","Value":113194.0,"Dimensions":[{"Name":"instance","Value":"i-123456"},{"Name":"session","Value":"1234234-1234-4ac2134c-12323-1234234"},{"Name":"connection","Value":"1"},{"Name":"channel","Value":"clipboard"}]}]

You can parse the JSON using a json filter

json { source => "message" target => "someField" }

but I suspect that you do not like the resulting format, which is an array of objects like

    [ 9] {
        "MetricName" => "written_bytes",
             "Value" => 24.0,
        "Dimensions" => [
            [0] {
                "Value" => "i-123456",
                 "Name" => "instance"
            },
            [1] {
                "Value" => "1234234-1234-4ac2134c-12323-1234234",
                 "Name" => "session"
            },
            [2] {
                "Value" => "1",
                 "Name" => "connection"
            },
            [3] {
                "Value" => "filestorage",
                 "Name" => "channel"
            }
        ],
         "Timestamp" => "2021-10-25T14:07:23Z"
    },

although the rtt and session_count objects do not look like that. You could do something like

    json { source => "message" target => "[@metadata][data]" remove_field => [ "message" ] }
    ruby {
        code => '
            d = event.get("[@metadata][data]")
            if d.is_a? Array
                newD = []
                d.each { |x|
                    item = {}
                    item["Timestamp"] = x["Timestamp"]
                    item[x["MetricName"]] = x["Value"]

                    if x["StatisticValues"]
                        item["StatisticValues"] = x["StatisticValues"]
                    end

                    if x["Dimensions"]
                        x["Dimensions"].each { |y|
                            item[y["Name"]] = y["Value"]
                        }
                    end
                    newD << item
                }
                event.set("[@metadata][result]", newD)
            end
        '
    }
    split { field => "[@metadata][result]" }
    ruby { code => 'event.get("[@metadata][result]").each { |k, v| event.set(k,v) }' }

    date { match => [ "Timestamp", "ISO8601" ] }

which would result in events like

{
     "@timestamp" => 2021-10-25T14:07:23.000Z,
            "rtt" => nil,
        "session" => "1234234-1234-4ac2134c-12323-1234234",
        "channel" => "main",
       "instance" => "i-123456",
      "Timestamp" => "2021-10-25T14:07:23Z",
"StatisticValues" => {
        "Minimum" => 3.5150000000000001,
        "Maximum" => 16.617999999999999,
    "SampleCount" => 23,
            "Sum" => 129.398
},
     "connection" => "1",
}

{
      "@timestamp" => 2021-10-25T12:29:45.000Z,
"connection_count" => 1.0,
        "instance" => "i-123456",
       "Timestamp" => "2021-10-25T12:29:45Z",
         "session" => "1234234-1234-4ac2134c-12323-1234234"
}

{
   "@timestamp" => 2021-10-25T12:29:44.000Z,
"session_count" => 1.0,
     "instance" => "i-123456",
    "Timestamp" => "2021-10-25T12:29:44Z"
}

That creates 13 events for the 13 entries in the JSON array. It may be you only want to retain some of them, and keep them all in the same event. In that case you would not create an array (so you would not need the split filter) and can just create a hash and add the entries you care about to it.

Hi Badger thanks for all your efforts on this. For some reason however I am not seeing the same results are you've outlined above frustratingly. When I run using your filter I still see the same results in that all data is contained within the message body i.e. it has not been split out the individual events that you have shown. Below is my configuration, is there something that I may have missed (the parser does execute successfully just not with the same results)???

input {
  file{
        path => "/tmp/parsetest/test2.json"
        start_position => "beginning"
        sincedb_path => "/dev/null"
        }
}

filter {
                        json { source => "message" target => "[@metadata][data]" remove_field => [ "message" ] }
                        ruby {
                                        code => '
                                        d = event.get("[@metadata][data]")
                                        if d.is_a? Array
                                                newD = []
                                                d.each { |x|
                                                                item = {}
                                                                item["Timestamp"] = x["Timestamp"]
                                                                item[x["MetricName"]] = x["Value"]

                                                                if x["StatisticValues"]
                                                                item["StatisticValues"] = x["StatisticValues"]
                                                                end

                                                                if x["Dimensions"]
                                                                        x["Dimensions"].each { |y|
                                                                        item[y["Name"]] = y["Value"]
                                                                                                                }
                                                                        end
                                                                        newD << item
                                                        }
                                                event.set("[@metadata][result]", newD)
                                        end
                                        '
                                }
                        split { field => "[@metadata][result]" }
                        ruby { code => 'event.get("[@metadata][result]").each { |k, v| event.set(k,v) }' }
                        date { match => [ "Timestamp", "ISO8601" ] }
                }
output {
        stdout { codec => rubydebug }
        }


The json filter would have deleted the [message] field if it successfully parsed the JSON, so if you still have a [message] field you likely also have a _jsonparsefailure tag. The logstash log should have an error message saying what the filter did not like.

There are in fact no _jsonparsefailure events but there are "_split_type_failure" && "_rubyexception" failures captured. The Ruby exception appears to be a result of an "undefined method 'each' for nil:NilClass" and the split failure seem to be a result of "Only String and Array types are splittable"? Im not sure that these errors represent.

[2021-10-26T22:01:40,272][WARN ][logstash.filters.split   ] Only String and Array types are splittable. field:[@metadata][result] is of type = NilClass
;
[2021-10-26T22:01:40,302][ERROR][logstash.filters.ruby    ] Ruby exception occurred: undefined method `each' for nil:NilClass {:class=>"NoMethodError", :backtrace=>["(ruby filter code):2:in `block in filter_method'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-filter-ruby-3.1.7/lib/logstash/filters/ruby.rb:93:in `inline_script'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-filter-ruby-3.1.7/lib/logstash/filters/ruby.rb:86:in `filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:143:in `do_filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:162:in `block in multi_filter'", "org/jruby/RubyArray.java:1792:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:159:in `multi_filter'", "org/logstash/config/ir/compiler/AbstractFilterDelegatorExt.java:115:in `multi_filter'", "(eval):157:in `block in filter_func'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:358:in `filter_batch'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:337:in `worker_loop'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:304:in `block in start_workers'"]}
[

The split filter logs an exception because [@metadata][result] is nil. The following ruby filter logs an exception for the same reason.

The code I provided is just an outline of an approach, not production ready code. It would need a lot of error checking.

Can you show a complete example of an event from rubydebug?

output { stdout { codec => rubydebug } }

Hi Badger, thanks for coming back again. I have copied up the stdout for a failed run of the config along with the logstash logs and the source json file used in the configuration. You can find them vie this link . Hopefully this helps clarify where/why things are failing.

Apologies incorrect URL you can access logs here -->

https://pjfergus-public.s3.eu-west-1.amazonaws.com/RubyLogs.zip?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIARVGBQYLJPNJ6MZHR%2F20211027%2Feu-west-1%2Fs3%2Faws4_request&X-Amz-Date=20211027T144210Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=6990ed1b436dcc9b04458c04fc9f33ad4f9c7fa24699e749dc301096b54d7217

OK, you are getting a _jsonparsefailure. That is because you have a timestamp in front of your JSON.

Unexpected character ('-' (code 45)): Expected space separating root-level values
at [Source: (byte)"2021-10-25 14:12:23,806221 [{"MetricName":"read_bytes_rate","Timestamp":"2021-10-25T14:11:23Z",

The JSON parser consumes 2021 as a key value, then blows up on the following hyphen.

You could do something like

dissect { mapping => { "message" => "%{[@metadata][ts]} %{+[@metadata][ts]} %{[@metadata][json]}" remove_field => [ "message" ]
date { match => [ "[@metadata][ts]", "YYYY-MM-dd HH:mm:ss,SSSSSS" ] }
json { source => "[@metadata][json]" target => "[@metadata][data]" }

and then the ruby filter should work.

Thnaks Badger for all your help. I actually seem to have had some success with the following..

filter {

    grok {
        pattern_definitions => {
          DIMENSION => "\{\"Name\"\:\"%{WORD}\"\,\"Value\"\:%{DATA}\}"
        }
        match => {
          "message" => "\{\"MetricName\"\:\"(?<MetricName>rtt)\"\,\"Timestamp\"\:\"%{TIMESTAMP_ISO8601:Timestamp}\"\,\"Unit\"\:\"%{WORD:Unit}\"\,\"StatisticValues\"\:\{\"SampleCount\"\:%{NUMBER:[StatisticValues][SampleCount]:int}\,\"Sum\"\:%{NUMBER:[StatisticValues][Sum]:float}\,\"Minimum\"\:%{NUMBER:[StatisticValues][Minimum]:float}\,\"Maximum\"\:%{NUMBER:[StatisticValues][Maximum]:float}\}\,\"Dimensions\"\:\[%{DIMENSION:Dimensions}\,%{DIMENSION:Dimensions}\,%{DIMENSION:Dimensions}\,%{DIMENSION:Dimensions}"
        }
        remove_field => ["message"]
    }
}

This does parse the objects into clean objects. The only outstanding issue is that it will fail to parse some records that do not match the pattern definition and these records are then simply injected in that (unparsed) format rather than being dropped so there is some unstructured records still being output to Elasticsearch.

Just an update on the above, removal of records was achieved using the drop filter as shown here....

filter {

    grok {
			add_tag => [ "valid" ]
			pattern_definitions => {
			DIMENSION => "\{\"Name\"\:\"%{WORD}\"\,\"Value\"\:%{DATA}\}"
        }
			match => {
			"message" => "\{\"MetricName\"\:\"(?<MetricName>rtt)\"\,\"Timestamp\"\:\"%{TIMESTAMP_ISO8601:Timestamp}\"\,\"Unit\"\:\"%{WORD:Unit}\"\,\"StatisticValues\"\:\{\"SampleCount\"\:%{NUMBER:[StatisticValues][SampleCount]:int}\,\"Sum\"\:%{NUMBER:[StatisticValues][Sum]:float}\,\"Minimum\"\:%{NUMBER:[StatisticValues][Minimum]:float}\,\"Maximum\"\:%{NUMBER:[StatisticValues][Maximum]:float}\}\,\"Dimensions\"\:\[%{DIMENSION:Dimensions}\,%{DIMENSION:Dimensions}\,%{DIMENSION:Dimensions}\,%{DIMENSION:Dimensions}"
					}
        remove_field => ["message"]
		}
	if "valid" not in [tags] {
		drop {}
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.