JSON Array split not working and how convert stirng to a JSON?

Hi All,
I 'm new in using latest version of logstash, I use it to parse json format (or similar to JSON) data to a text file.

But there's a JSON Array problem makes me a bit of confusion.

I use the postman in Chrome Browser to post a JSON data, which has a field forms as an Array, to a url so that nginx can save the post data into its log file.

The post JSON data as below:

[{"channel":"khtml","device":"wechat","duration":0,"main_param":"","page_code":"p_w_pay_list","previous_view":"","upload_time":153417688,"user_id":2011111,"uuid":"153466666661-3233333"},{"channel":"khtml","device":"wechat","duration":0,"main_param":"","page_code":"p_w_pay_list","previous_view":"","upload_time":153417688,"user_id":2022222,"uuid":"153466666661-3233333"}]

You can see I post a JSON Array.

In the log file of nginx, it is saved in form of below:

{"@timestamp":"2018-08-14T09:36:06+08:00","host":"10.30.20.124","clientip":"10.1.4.38","size":49,"responsetime":0.000,"upstreamtime":"0.000","upstreamhost":"127.0.0.1:80","http_host":"10.30.20.124","query_body":"[{\"channel\":\"kxhtml\",\"device\":\"wechat\",\"duration\":0,\"main_param\":\"\",\"page_code\":\"p_w_pay_list\",\"previous_view\":\"\",\"upload_time\":1534176001288,\"user_id\":2011111,\"uuid\":\"153466666661-3233333\"},{\"channel\":\"kxhtml\",\"device\":\"wechat\",\"duration\":0,\"main_param\":\"\",\"page_code\":\"p_w_pay_list\",\"previous_view\":\"\",\"upload_time\":1534176001288,\"user_id\":2022222,\"uuid\":\"153466666661-3233333\"}]","url":"/static/analysis/bf.gif","xff":"","referer":"","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36","status":"200"}

Notice that there is a pair of double quotes surround the value of "query_body" field, what is I just post. It's generated by the nginx.

And then I use a config to parse the log snippet. My logstash config is below:

input {
    file {
        path => "/data/log/nginx/*/*/access_mix.*.log"
        start_position => "beginning"
        sincedb_path => "/dev/null" 
        codec => json
    }
}

filter {
    mutate {
        split => [ "upstreamtime", "," ]
    }
    mutate {
        convert => [ "upstreamtime", "float" ]
    }
    if [query_body] =~ /.+/ {
        #json_encode {      # this also cannot work
        #    source => "[query_body]"
        #}
        split {
            field => "query_body"
        }
    }
}

output {
    if [query_string] != "-" {
        stdout {
            codec => rubydebug
        }
    }
}

but the result shows as below:

{
             "url" => "/static/analysis/bf.gif",
        "clientip" => "10.1.4.38",
      "query_body" => "[{\"channel\":\"kxhtml\",\"device\":\"wechat\",\"duration\":0,\"main_param\":\"\",\"page_code\":\"p_w_pay_list\",\"previous_view\":\"\",\"upload_time\":1534176001288,\"user_id\":2011111,\"uuid\":\"153466666661-3233333\"},{\"channel\":\"kxhtml\",\"device\":\"wechat\",\"duration\":0,\"main_param\":\"\",\"page_code\":\"p_w_pay_list\",\"previous_view\":\"\",\"upload_time\":1534176001288,\"user_id\":2022222,\"uuid\":\"153466666661-3233333\"}]",
    "upstreamhost" => "127.0.0.1:80",
           "agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
      "@timestamp" => 2018-08-14T01:36:06.000Z,
         "referer" => "",
       "http_host" => "10.30.20.124",
          "status" => "200",
            "host" => "10.30.20.124",
    "upstreamtime" => [
        [0] 0.0
    ],
            "path" => "/data/log/nginx/2018/201808/access_mix.20180813.log",
    "responsetime" => 0.0,
             "xff" => "",
            "size" => 49,
        "@version" => "1"
}

The result not shown as expect, which is separated to two json objects (or much more) base on the number of sub JSON object in 'query_body' field.
I think the reason is: the value of "query_body" is a string for the logstash parser and not a Array.

So I try to do the following two methods separately:

  1. Replace the outermost layer of "" to null with gsub, but it's not working.
  2. Try to use json_encode to parse the string value of query_body into a JSON object.
  3. Delete the "" out of query_body value, from log file directly. The logstash shows err infos when it start.

I test the value of "query_body", it's the validated format of a JSON Array.

So my question is, how to do so that the split plugin can work?

Thanks !

Like this...

    json { source => "message" remove_field => [ "message" ] } 
    json { source => "query_body" target => "body" remove_field => [ "query_body" ] } 
    split { field => "body" }

Thanks, it's work!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.