Problem with replace some char using RegExp. under mutate/gsub

Hi Guys,
I meet some problem when I use logstash to parse some data from the nginx logs.
I have the log content as below:

{"reqtime":"06/Aug/2018:09:02:17 +0800", "reqbody":"------WebKitFormBoundaryM2dvVjK2xtgaOP6a\r\nContent-Disposition: form-data; name=\"aaa\"\r\n\r\n1111\r\n------WebKitFormBoundaryM2dvVjK2xtgaOP6a\r\nContent-Disposition: form-data; name=\"bbb\"\r\n\r\n五5六6七7八8\r\n------WebKitFormBoundaryM2dvVjK2xtgaOP6a\r\nContent-Disposition: form-data; name=\"ccc\"\r\n\r\n{'xxx':5555, 'yyyy':6666}\r\n------WebKitFormBoundaryM2dvVjK2xtgaOP6a--\r\n", "status":"200"}
{"reqtime":"06/Aug/2018:10:29:08 +0800", "reqbody":"{\"databody\":{\"certNo\":\"41011112340222456X\",\"customerId\":31795539663626902,\"dataFrom\":\"baiqishi\",\"eventType\":\"login\",\"mobile\":\"13681881234\",\"name\":\"李飞\"},\"channel\":\"bqssdk\",\"bussiness\":\"query\"}", "status":"200"}

You can see there are many backslashes in the json-string string, such as in the 2nd json-string piece. So the json-string cannot convert to json object and be processed directly,

I use the Logstash config as below:

input {
    file {
        path => "/data/log/nginx/*/*/*.log"
        start_position => "beginning"
        sincedb_path => "/dev/null" 
    }
}
filter {
    mutate {
        gsub => [
            "reqbody", "\"{", "{",
            "reqbody", "}\"", "}"
        ]
        gsub => ["reqbody", "[\\\"]", "\""]
        gsub => ["reqbody", "[\\]", ""]
    } 
}
output {
    #stdout { codec => rubydebug } 
    file {
        path => "/data/log/logstash/%{+YYYYMM}/test.%{+YYYYMMdd}.txt"
    }
}

As you see, I have made some convert by using mutate/gsub in filter section, but it seems not working, and the backslashes get much more in output as below:

{"host":"0.0.0.0","path":"/data/log/nginx/2018/201808/access.20180806.log","@version":"1","message":"{\"reqtime\":\"06/Aug/2018:10:29:08 +0800\", \"reqbody\":\"{\\\"databody\\\":{\\\"certNo\\\":\\\"41011112340222456X\\\",\\\"customerId\\\":31795539663626902,\\\"dataFrom\\\":\\\"baiqishi\\\",\\\"eventType\\\":\\\"login\\\",\\\"mobile\\\":\\\"13681881234\\\",\\\"name\\\":\\\"李飞\\\"},\\\"channel\\\":\\\"bqssdk\\\",\\\"bussiness\\\":\\\"query\\\"}\", \"status\":\"200\"}","@timestamp":"2018-08-06T23:08:13.742Z"}
{"host":"0.0.0.0","path":"/data/log/nginx/2018/201808/access.20180806.log","@version":"1","message":"{\"reqtime\":\"06/Aug/2018:09:02:17 +0800\", \"reqbody\":\"------WebKitFormBoundaryM2dvVjK2xtgaOP6a\\r\\nContent-Disposition: form-data; name=\\\"aaa\\\"\\r\\n\\r\\n1111\\r\\n------WebKitFormBoundaryM2dvVjK2xtgaOP6a\\r\\nContent-Disposition: form-data; name=\\\"bbb\\\"\\r\\n\\r\\n五5六6七7八8\\r\\n------WebKitFormBoundaryM2dvVjK2xtgaOP6a\\r\\nContent-Disposition: form-data; name=\\\"ccc\\\"\\r\\n\\r\\n{'xxx':5555, 'yyyy':6666}\\r\\n------WebKitFormBoundaryM2dvVjK2xtgaOP6a--\\r\\n\", \"status\":\"200\"}","@timestamp":"2018-08-06T23:08:13.713Z"}

I have tried some solutions, but I can't resolve it, can you help me?
Thanks a lot!

If you want to parse the JSON payload in the reqbody field you should use a json filter.

yeah, it's the problem with error JSON fmt, so that I can't parse it at the begin

Sorry, I don't understand. The second example you gave seems to contain a valid JSON string in the reqbody field. From what I can tell you don't need any mutate filters.

thanks, I have fixed it. I found that the 2nd JSON field is not a valid JSON string too. So I use reg exp to replace some chars.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.