Mutate a specific JSON field but not another

Hello, I have data that looks like this.

{
    "remote_addr": "127.0.0.1",
    "time_local": "26/Jan/2023:17:07:18 -0800",
    "request": "POST /abcd HTTP/1.1",
    "request_method": "POST",
    "status": "200",
    "user_agent": "curl/7.84.0",
    "headers": {\x22Host\x22:\x22localhost\x22,\x22User-Agent\x22:\x22curl/7.84.0\x22,\x22Accept\x22:\x22*/*\x22,\x22Content-Length\x22:\x2221\x22,\x22Content-Type\x22:\x22application/x-www-form-urlencoded\x22},
    "request_body": "{\x22test_json\x22: \x22test\x22}"
}

I am trying to replace the \x22 in just the headers field so logstash parses correctly, and not request_body at the end. I want that to remain escaped in case I get JSON data in a POST request, so a simple gsub will cause parsing errors if the request_body contains JSON data

If I do something like

filter {
   mutate {
       split => {"message" => "," }
       gsub => ["%{[message][6]}", "\\x22", '"']
   }
   json { source => "message" }
}

I get {:exception=>"Invalid FieldReference: `%{[message][6]}`"}

What would be the correct way to do this? Thanks in advance!

The mutate filter does thing in a fixed order, regardless of the order of the options in the configuration file. gsub is done before split, so when the gsub executes the [message] field is not an array, so "%{[message][6]}" is not a valid field reference.

Not my preference, but I did try

    mutate { split => {"message" => "," } }
    mutate { gsub => ["%{[message][6]}", "\\x22", '"'] }

but that gets the same error. Looking at the code I don't quite understand what code path it follows, so I cannot explain why it does not do the sprintf.

Personally I would write a ruby filter to do the string manipulation, but it can be done just using other logstash plugins. It took me a couple of hours, and I learned something about the difference between \z and \Z in ruby regexps, but if you use

    mutate { gsub => [ "[message]", "\n", "", "[message]", "\s+", " " ] }
    mutate {
        add_field => {
            "[@metadata][part1]" => "%{message}"
            "[@metadata][part2]" => "%{message}"
            "[@metadata][part3]" => "%{message}"
        }
    }
    mutate { gsub => [ "[@metadata][part1]", ',\s"headers".*\z', "" ] }
    mutate { gsub => [ "[@metadata][part2]", '\A.*("headers":\s*{[^}]*}).*\z', "\1" ] }
    mutate { gsub => [ "[@metadata][part2]", "\\x22", '"' ] }
    mutate { gsub => [ "[@metadata][part3]", '\A.*"headers": {[^}]*}(.*)\z', "\1" ] }
    mutate { gsub => [ "[@metadata][part3]", "\\x22", '\\"' ] }

    mutate { replace => { "message" => "%{[@metadata][part1]}, %{[@metadata][part2]} %{[@metadata][part3]}" } }
    json { source => "message" }

then you can end up with

"request_method" => "POST",
  "request_body" => "{\"test_json\": \"test\"}",
       "headers" => {
    "Content-Length" => "21",
      "Content-Type" => "application/x-www-form-urlencoded",
              "Host" => "localhost",
            "Accept" => "*/*",
        "User-Agent" => "curl/7.84.0"
},
        "status" => "200",
    "time_local" => "26/Jan/2023:17:07:18 -0800",
   "remote_addr" => "127.0.0.1",
       "request" => "POST /abcd HTTP/1.1",
    "user_agent" => "curl/7.84.0"

It is ugly, and as posted it is a ridiculously fragile. Every step needs to be tested to check that that pattern matched.

This is not meant to be a solution, it a hint in the direction of a solution.

Thank you for the work you put into this. It's a step in the right direction but I was still getting errors with that config. I suspect its because my raw data looks like this:

{"remote_addr": "127.0.0.1","time_local": "27/Jan/2023:11:08:37 -0800","request": "POST /abcd HTTP/1.1", "request_method": "POST","status": "200","user_agent": "curl/7.84.0","headers": {\x22Host\x22:\x22localhost\x22,\x22User-Agent\x22:\x22curl/7.84.0\x22,\x22Accept\x22:\x22*/*\x22,\x22Content-Length\x22:\x2221\x22,\x22Content-Type\x22:\x22application/x-www-form-urlencoded\x22},"http_ssl_ja3": "771,4866-4867-4865-49196-49200-159-52393-52392-52394-49195-49199-158-49188-49192-107-49187-49191-103-49162-49172-57-49161-49171-51-157-156-61-60-53-47-255,0-11-10-13172-16-22-23-49-13-43-45-51,29-23-30-25-24,0-1-2", "http_ssl_ja3_hash": "ba730f97dcd1122e74e65411e68f1b40","request_body": "{\x22test_json\x22: \x22test\x22}"}

I had cleaned the original post up to make it more readable. I don't use newlines in the nginx log format. It also doesn't seem to be applying the mutate. I'm still seeing \x22 in the trace, so I'm not sure why it wouldn't be applying.

[WARN ] 2023-01-27 11:08:40.720 [[main]>worker16] json - Error parsing json {:source=>"message", :raw=>"{\"remote_addr\": \"127.0.0.1\",\"time_local\": \"27/Jan/2023:11:08:37 -0800\",\"request\": \"POST /abcd HTTP/1.1\", \"request_method\": \"POST\",\"status\": \"200\",\"user_agent\": \"curl/7.84.0\",\"headers\": {\\x22Host\\x22:\\x22localhost\\x22,\\x22User-Agent\\x22:\\x22curl/7.84.0\\x22,\\x22Accept\\x22:\\x22*/*\\x22,\\x22Content-Length\\x22:\\x2221\\x22,\\x22Content-Type\\x22:\\x22application/x-www-form-urlencoded\\x22},\"http_ssl_ja3\": \"771,4866-4867-4865-49196-49200-159-52393-52392-52394-49195-49199-158-49188-49192-107-49187-49191-103-49162-49172-57-49161-49171-51-157-156-61-60-53-47-255,0-11-10-13172-16-22-23-49-13-43-45-51,29-23-30-25-24,0-1-2\", \"http_ssl_ja3_hash\": \"ba730f97dcd1122e74e65411e68f1b40\",\"request_body\": \"{\\x22test_json\\x22: \\x22test\\x22}\"}, \"headers\": {\"Host\":\"localhost\",\"User-Agent\":\"curl/7.84.0\",\"Accept\":\"*/*\",\"Content-Length\":\"21\",\"Content-Type\":\"application/x-www-form-urlencoded\"} ,\"http_ssl_ja3\": \"771,4866-4867-4865-49196-49200-159-52393-52392-52394-49195-49199-158-49188-49192-107-49187-49191-103-49162-49172-57-49161-49171-51-157-156-61-60-53-47-255,0-11-10-13172-16-22-23-49-13-43-45-51,29-23-30-25-24,0-1-2\", \"http_ssl_ja3_hash\": \"ba730f97dcd1122e74e65411e68f1b40\",\"request_body\": \"{\\\"test_json\\\": \\\"test\\\"}\"}", :exception=>#<LogStash::Json::ParserError: Unexpected character ('\' (code 92)): was expecting double-quote to start field name
 at [Source: (byte[])"{"remote_addr": "127.0.0.1","time_local": "27/Jan/2023:11:08:37 -0800","request": "POST /abcd HTTP/1.1", "request_method": "POST","status": "200","user_agent": "curl/7.84.0","headers": {\x22Host\x22:\x22localhost\x22,\x22User-Agent\x22:\x22curl/7.84.0\x22,\x22Accept\x22:\x22*/*\x22,\x22Content-Length\x22:\x2221\x22,\x22Content-Type\x22:\x22application/x-www-form-urlencoded\x22},"http_ssl_ja3": "771,4866-4867-4865-49196-49200-159-52393-52392-52394-49195-49199-158-49188-49192-107-49187-49191-103-4"[truncated 705 bytes]; line: 1, column: 188]>}

So if I understand this correctly,

    mutate { gsub => [ "[@metadata][part2]", '\A.*("headers":\s*{[^}]*}).*\z', "\1" ] }
    mutate { gsub => [ "[@metadata][part2]", "\\x22", '"' ] }

Is basically selecting the regex match at group index 1(\1 , highlighted in green below) and applying the \x22 => " mutate to that specifically?

Then this line merges all the regex groups back together?

    mutate { replace => { "message" => "%{[@metadata][part1]}, %{[@metadata][part2]} %{[@metadata][part3]}" } }

Thank you so much for pointing me in the right direction. I figured out a solution:

filter {
    mutate {
        add_field => {
            "[@metadata][part1]" => "%{message}"
            "[@metadata][part2]" => "%{message}"
            "[@metadata][part3]" => "%{message}"
        }
    }
    mutate { gsub => [ "[@metadata][part1]", '\A.*({"remote_addr":.*"headers").*\z', "\1" ] }
    mutate { gsub => [ "[@metadata][part1]", '"headers"', "" ] }
    mutate { gsub => [ "[@metadata][part2]", '\A.*("headers":\s*{[^}]*}).*\z', "\1" ] }
    mutate { gsub => [ "[@metadata][part2]", "\\x22", '"' ] }
    mutate { gsub => [ "[@metadata][part3]", '\A.*"headers": {[^}]*}(.*)\z', "\1" ] }
    mutate { gsub => [ "[@metadata][part3]", "\\x22", '\\"' ] }

    mutate { replace => { "message" => "%{[@metadata][part1]} %{[@metadata][part2]} %{[@metadata][part3]}" } }
    json { source => "message" }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.