Hi All,
I 'm new in using latest version of logstash, I use it to parse json format (or similar to JSON) data to a text file.
But there's a JSON Array problem makes me a bit of confusion.
I use the postman in Chrome Browser to post a JSON data, which has a field forms as an Array, to a url so that nginx can save the post data into its log file.
The post JSON data as below:
[{"channel":"khtml","device":"wechat","duration":0,"main_param":"","page_code":"p_w_pay_list","previous_view":"","upload_time":153417688,"user_id":2011111,"uuid":"153466666661-3233333"},{"channel":"khtml","device":"wechat","duration":0,"main_param":"","page_code":"p_w_pay_list","previous_view":"","upload_time":153417688,"user_id":2022222,"uuid":"153466666661-3233333"}]
You can see I post a JSON Array.
In the log file of nginx, it is saved in form of below:
{"@timestamp":"2018-08-14T09:36:06+08:00","host":"10.30.20.124","clientip":"10.1.4.38","size":49,"responsetime":0.000,"upstreamtime":"0.000","upstreamhost":"127.0.0.1:80","http_host":"10.30.20.124","query_body":"[{\"channel\":\"kxhtml\",\"device\":\"wechat\",\"duration\":0,\"main_param\":\"\",\"page_code\":\"p_w_pay_list\",\"previous_view\":\"\",\"upload_time\":1534176001288,\"user_id\":2011111,\"uuid\":\"153466666661-3233333\"},{\"channel\":\"kxhtml\",\"device\":\"wechat\",\"duration\":0,\"main_param\":\"\",\"page_code\":\"p_w_pay_list\",\"previous_view\":\"\",\"upload_time\":1534176001288,\"user_id\":2022222,\"uuid\":\"153466666661-3233333\"}]","url":"/static/analysis/bf.gif","xff":"","referer":"","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36","status":"200"}
Notice that there is a pair of double quotes surround the value of "query_body
" field, what is I just post. It's generated by the nginx.
And then I use a config to parse the log snippet. My logstash config is below:
input {
file {
path => "/data/log/nginx/*/*/access_mix.*.log"
start_position => "beginning"
sincedb_path => "/dev/null"
codec => json
}
}
filter {
mutate {
split => [ "upstreamtime", "," ]
}
mutate {
convert => [ "upstreamtime", "float" ]
}
if [query_body] =~ /.+/ {
#json_encode { # this also cannot work
# source => "[query_body]"
#}
split {
field => "query_body"
}
}
}
output {
if [query_string] != "-" {
stdout {
codec => rubydebug
}
}
}
but the result shows as below:
{
"url" => "/static/analysis/bf.gif",
"clientip" => "10.1.4.38",
"query_body" => "[{\"channel\":\"kxhtml\",\"device\":\"wechat\",\"duration\":0,\"main_param\":\"\",\"page_code\":\"p_w_pay_list\",\"previous_view\":\"\",\"upload_time\":1534176001288,\"user_id\":2011111,\"uuid\":\"153466666661-3233333\"},{\"channel\":\"kxhtml\",\"device\":\"wechat\",\"duration\":0,\"main_param\":\"\",\"page_code\":\"p_w_pay_list\",\"previous_view\":\"\",\"upload_time\":1534176001288,\"user_id\":2022222,\"uuid\":\"153466666661-3233333\"}]",
"upstreamhost" => "127.0.0.1:80",
"agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
"@timestamp" => 2018-08-14T01:36:06.000Z,
"referer" => "",
"http_host" => "10.30.20.124",
"status" => "200",
"host" => "10.30.20.124",
"upstreamtime" => [
[0] 0.0
],
"path" => "/data/log/nginx/2018/201808/access_mix.20180813.log",
"responsetime" => 0.0,
"xff" => "",
"size" => 49,
"@version" => "1"
}
The result not shown as expect, which is separated to two json objects (or much more) base on the number of sub JSON object in 'query_body
' field.
I think the reason is: the value of "query_body
" is a string for the logstash parser and not a Array.
So I try to do the following two methods separately:
- Replace the outermost layer of
""
to null with gsub, but it's not working. - Try to use json_encode to parse the string value of
query_body
into a JSON object. - Delete the
""
out ofquery_body
value, from log file directly. The logstash shows err infos when it start.
I test the value of "query_body
", it's the validated format of a JSON Array.
So my question is, how to do so that the split plugin can work?
Thanks !