How to remove backslash from weird formatted json

Hello :raised_hand_with_fingers_splayed:,

I have the following (valid) formatted json of which I can't change the formatting :

[{
    "id": "1",
    "sn": "00:1E:C0:8D:9A:CD",
    "log": "{\"date\":\"16\/04\/2021\",\"time\":\"09h24\",\"t_1\":\"20.7\",\"t_2\":\"20.0\",\"t_3\":\"12.6\",\"t_4\":\"19.1\",\"p_1\":\"115\",\"w_1\":\"0.52\",\"w_2\":\"3.72\",\"w_4\":\"1.64\"}",
    "date": "2021-04-16 09:24:50",
    "udate": "1618557890",
    "source": "192.168.10.230"
}, {
    "id": "2",
    "sn": "00:1E:C0:8D:9A:CD",
    "log": "{\"date\":\"16\/04\/2021\",\"time\":\"09h34\",\"t_1\":\"20.8\",\"t_2\":\"20.2\",\"t_3\":\"12.9\",\"t_4\":\"19.2\",\"p_1\":\"233\",\"w_1\":\"1.06\",\"w_2\":\"3.69\",\"w_4\":\"1.61\"}",
    "date": "2021-04-16 09:34:50",
    "udate": "1618558490",
    "source": "192.168.10.230"
}, {

And I wish to parse the "log" entry. However as you can see the double quotes '"' are escaped using a blackslash "".

I would like to remove the backslash to properly treat the data but I can't manage it even after some hours searching this particular topic :frowning:

I tried this mutate filter :

    mutate {
        gsub => ["message","[\\]",""]
    }

And some other things.... Since I'm a novice with the Elastic Stack it's difficult since there are a lot of concept to take in and I'm not exactly sure what I'm doing.

My input is like this :

input {
    file {
        type => "json"
        path => "/home/maxence/dev/IPMS/kibana/tuto/sample/es-log.json"
        start_position => beginning
    }
}

My output gets split like this :

{
          "type" => "json",
          "host" => "pop-os",
      "@version" => "1",
       "message" => "    \"log\": \"{\"date\":\"17/04/2021\",\"time\":\"03h54\",\"t_1\":\"20.4\",\"t_2\":\"20.0\",\"t_3\":\"9.6\",\"t_4\":\"17.1\",\"p_1\":\"157\",\"w_1\":\"0.71\",\"w_2\":\"3.30\",\"w_4\":\"1.38\"}\",",
    "@timestamp" => 2021-07-19T15:31:05.119Z,
          "path" => "/home/maxence/dev/IPMS/kibana/tuto/sample/es-log.json",
          "tags" => [
        [0] "_jsonparsefailure"
    ]
}
{
          "type" => "json",
          "host" => "pop-os",
      "@version" => "1",
       "message" => "    \"date\": \"2021-04-17 04:04:47\",",
    "@timestamp" => 2021-07-19T15:31:05.120Z,
          "path" => "/home/maxence/dev/IPMS/kibana/tuto/sample/es-log.json",
          "tags" => [
        [0] "_jsonparsefailure"
    ]
}
{
          "type" => "json",
          "host" => "pop-os",
      "@version" => "1",
       "message" => "    \"udate\": \"1618625687\",",
    "@timestamp" => 2021-07-19T15:31:05.121Z,
          "path" => "/home/maxence/dev/IPMS/kibana/tuto/sample/es-log.json",
          "tags" => [
        [0] "_jsonparsefailure"
    ]
}
{
          "type" => "json",
          "host" => "pop-os",
      "@version" => "1",
       "message" => "    \"source\": \"192.168.10.230\"",
    "@timestamp" => 2021-07-19T15:31:05.121Z,
          "path" => "/home/maxence/dev/IPMS/kibana/tuto/sample/es-log.json",
          "tags" => [
        [0] "_jsonparsefailure"
    ]
}

It looks like your JSON is pretty printed, in which case you need a multiline codec to put the parts of the object back together. There is an example of consuming a file as a single event here.

Once you do that you have a event containing JSON that contains nested JSON. You can use a json filter to do the parsing.

    json { source => "message" remove_field => [ "message" ] target => "[someField]" }
    json { source => "[someField][0][log]" target => "[someField][0][stuff]" }

That will result in

 "someField" => [
    [0] {
          "date" => "2021-04-16 09:24:50",
           "log" => "{\"date\":\"16/04/2021\",\"time\":\"09h24\",\"t_1\":\"20.7\",\"t_2\":\"20.0\",\"t_3\":\"12.6\",\"t_4\":\"19.1\",\"p_1\":\"115\",\"w_1\":\"0.52\",\"w_2\":\"3.72\",\"w_4\":\"1.64\"}",
        "source" => "192.168.10.230",
            "sn" => "00:1E:C0:8D:9A:CD",
            "id" => "1",
         "udate" => "1618557890",
         "stuff" => {
            "date" => "16/04/2021",
             "p_1" => "115",
             "t_2" => "20.0",
             "t_1" => "20.7",
             "w_1" => "0.52",
             "t_4" => "19.1",
             "t_3" => "12.6",
             "w_2" => "3.72",
            "time" => "09h24",
             "w_4" => "1.64"
        }
    },

You might, or might not, want to use a split filter to divide the [someField] array into multiple events.

If the array is variable length and you need to iterate over it, parsing each entry, then you would need a ruby filter.

Hi Badger,
First of all, thanks a lot for your time ! it's so cool you answered in 5 minutes after I spent a lot of time reading your post to try and troubleshoot my problem.

Your solution works very well, thank you.

However there is an unexpected catch : I would like to add a field containing coordinates (lat&long).

Before I tried that, everything was going smooth, here's my pipeline:

input {
    file {
        type => "json"
        path => "/home/maxence/dev/IPMS/kibana/tuto/sample/paris.json"
        start_position => beginning
        codec => multiline { pattern => "^SpalanzaniWillNeverbeFoundButItsNegatedSoItsOk" negate => true what => previous auto_flush_interval => 1 multiline_tag => "" max_lines => 1000 }
    }
}

filter {

    json { source => "message" remove_field => [ "message" ] target => "[someField]" }
    split { field => "someField" }
    json { source => "[someField][log]" target => "[someField][stuff]" }


}

output {
    stdout { codec => rubydebug}
}

And when I feed it this kind of json:

[{
    "id": "1",
    "sn": "00:1E:C0:8D:9A:CD",
    "log": "{\"date\":\"16\/04\/2021\",\"time\":\"09h24\",\"t_1\":\"20.7\",\"t_2\":\"20.0\",\"t_3\":\"12.6\",\"t_4\":\"19.1\",\"p_1\":\"115\",\"w_1\":\"0.52\",\"w_2\":\"3.72\",\"w_4\":\"1.64\"}",
    "date": "2021-04-16 09:24:50",
    "udate": "1618557890",
    "source": "192.168.10.230"
}, {
    "id": "2",
    "sn": "00:1E:C0:8D:9A:CD",
    "log": "{\"date\":\"16\/04\/2021\",\"time\":\"09h34\",\"t_1\":\"20.8\",\"t_2\":\"20.2\",\"t_3\":\"12.9\",\"t_4\":\"19.2\",\"p_1\":\"233\",\"w_1\":\"1.06\",\"w_2\":\"3.69\",\"w_4\":\"1.61\"}",
    "date": "2021-04-16 09:34:50",
    "udate": "1618558490",
    "source": "192.168.10.230"
},
.....
...goes on.... 
.....
]

It works perfectly, and logstash answers with these:

{
    "@timestamp" => 2021-07-20T11:20:09.536Z,
          "path" => "/home/maxence/dev/IPMS/kibana/tuto/sample/es-log.json",
          "type" => "json",
     "someField" => {
          "date" => "2021-04-17 04:14:47",
         "udate" => "1618625687",
         "stuff" => {
            "date" => "17/04/2021",
             "p_1" => "115",
             "w_4" => "1.54",
             "t_3" => "9.4",
             "w_2" => "6.95",
             "t_2" => "19.9",
             "t_4" => "17.1",
             "w_1" => "0.52",
            "time" => "04h14",
             "t_1" => "20.4"
        },
        "source" => "192.168.10.230",
            "id" => "125",
           "log" => "{\"date\":\"17/04/2021\",\"time\":\"04h14\",\"t_1\":\"20.4\",\"t_2\":\"19.9\",\"t_3\":\"9.4\",\"t_4\":\"17.1\",\"p_1\":\"115\",\"w_1\":\"0.52\",\"w_2\":\"6.95\",\"w_4\":\"1.54\"}",
            "sn" => "00:1E:C0:8D:9A:CD"
    },
      "@version" => "1",
          "host" => "pop-os"
}
{
    "@timestamp" => 2021-07-20T11:20:09.536Z,
          "path" => "/home/maxence/dev/IPMS/kibana/tuto/sample/es-log.json",
          "type" => "json",
     "someField" => {
          "date" => "2021-04-17 04:24:47",
         "udate" => "1618626287",
         "stuff" => {
            "date" => "17/04/2021",
             "p_1" => "115",
             "w_4" => "1.57",
             "t_3" => "9.4",
             "w_2" => "3.27",
             "t_2" => "19.9",
             "t_4" => "17.1",
             "w_1" => "0.52",
            "time" => "04h24",
             "t_1" => "20.4"
        },
        "source" => "192.168.10.230",
            "id" => "126",
           "log" => "{\"date\":\"17/04/2021\",\"time\":\"04h24\",\"t_1\":\"20.4\",\"t_2\":\"19.9\",\"t_3\":\"9.4\",\"t_4\":\"17.1\",\"p_1\":\"115\",\"w_1\":\"0.52\",\"w_2\":\"3.27\",\"w_4\":\"1.57\"}",
            "sn" => "00:1E:C0:8D:9A:CD"
    },
      "@version" => "1",
          "host" => "pop-os"
}

HOWEVER, when I add a field like so in my source json (end-goal is to pass coordinates for kibana, so I started with adding a nested json but it didnt work so I tried to first adding a simple field "key":"valueX") :

[{
    "id": "1",
    "sn": "00:1E:C0:8D:9A:CD",
    "log": "{\"date\":\"16/04/2021\",\"time\":\"09h24\",\"t_1\":\"20.7\",\"t_2\":\"20.0\",\"t_3\":\"12.6\",\"t_4\":\"19.1\",\"p_1\":\"115\",\"w_1\":\"0.52\",\"w_2\":\"3.72\",\"w_4\":\"1.64\"}",
    "date": "2021-04-16 09:24:50",
    "udate": "1618557890",
    "source": "192.168.10.230",
    "key": "value0"  // <<<<2<< HERE I ADDED A FIELD
}, {
    "id": "2",
    "sn": "00:1E:C0:8D:9A:CD",
    "log": "{\"date\":\"16/04/2021\",\"time\":\"09h34\",\"t_1\":\"20.8\",\"t_2\":\"20.2\",\"t_3\":\"12.9\",\"t_4\":\"19.2\",\"p_1\":\"233\",\"w_1\":\"1.06\",\"w_2\":\"3.69\",\"w_4\":\"1.61\"}",
    "date": "2021-04-16 09:34:50",
    "udate": "1618558490",
    "source": "192.168.10.230",
    "key": "value1" // <<<<<< AND HERE... For every entry of my json really
},
...
... goes on....
....
]

Well logstash is not happy at all with this change and I find myself scratching my head even harder than yesterday, because why would adding a field in json break something ???
However this what logstash screams at me when doing so :

Error parsing json {:source=>"message", :raw=>"[{\n    \"id\": \"1\",\n    \"sn\": \"00:1E:C0:8D:9A:CD\",\n    \"log\": \"{\\\"date\\\":\\\"16/04/2021\\\",\\\"time\\\":\\\"09h24\\\",\\\"t_1\\\":\\\"20.7\\\",\\\"t_2\\\":\\\"20.0\\\",\\\"t_3\\\":\\\"12.6\\\",\\\"t_4\\\":\\\"19.1\\\",\\\"p_1\\\":\\\"115\\\",\\\"w_1\\\":\\\"0.52\\\",\\\"w_2\\\":\\\"3.72\\\",\\\"w_4\\\":\\\"1.64\\\"}\",\n    \"date\": \"2021-04-16 09:24:50\",\n    \"udate\": \"1618557890\",\n    \"source\": \"192.168.10.230\",\n    \"key\": \"value0\"\n}, {\n    \"id\": \"2\",\n    \"sn\": \"00:1E:C0:8D:9A:CD\",\n    \"log\": \"{\\\"date\\\":\\\"16/04/2021\\\",\\\"time\\\":\\\"09h34\\\",\\\"t_1\\\":\\\"20.8\\\",\\\"t_2\\\":\\\"20.2\\\",\\\"t_3\\\":\\\"12.9\\\",\\\"t_4\\\":\\\"19.2\\\",\\\"p_1\\\":\\\"233\\\",\\\"w_1\\\":\\\"1.06\\\",\\\"w_2\\\":\\\"3.69\\\",\\\"w_4\\\":\\\"1.61\\\"}\",\n    \"date\": \"2021-04-16 09:34:50\",\n    \"udate\": \"1618558490\",\n    \"source\": \"192.168.10.230\",\n    \"key\": \"value1\"\n},

// [REDACTED LONG STRING OF MY WHOLE JSON]


exception=>#<LogStash::Json::ParserError: Unexpected end-of-input: expected close marker for Object (start marker at [Source: (byte[])"[{
    "id": "1",
    "sn": "00:1E:C0:8D:9A:CD",
    "log": "{\"date\":\"16/04/2021\",\"time\":\"09h24\",\"t_1\":\"20.7\",\"t_2\":\"20.0\",\"t_3\":\"12.6\",\"t_4\":\"19.1\",\"p_1\":\"115\",\"w_1\":\"0.52\",\"w_2\":\"3.72\",\"w_4\":\"1.64\"}",
    "date": "2021-04-16 09:24:50",
    "udate": "1618557890",
    "source": "192.168.10.230",
    "key": "value0"
}, {
    "id": "2",
    "sn": "00:1E:C0:8D:9A:CD",
    "log": "{\"date\":\"16/04/2021\",\"time\":\"09h34\",\"t_1\":\"20.8\",\"t_2\":\"20.2\""[truncated 41247 bytes]; line: 913, column: 4])
 at [Source: (byte[])"[{
    "id": "1",
    "sn": "00:1E:C0:8D:9A:CD",
    "log": "{\"date\":\"16/04/2021\",\"time\":\"09h24\",\"t_1\":\"20.7\",\"t_2\":\"20.0\",\"t_3\":\"12.6\",\"t_4\":\"19.1\",\"p_1\":\"115\",\"w_1\":\"0.52\",\"w_2\":\"3.72\",\"w_4\":\"1.64\"}",
    "date": "2021-04-16 09:24:50",
    "udate": "1618557890",
    "source": "192.168.10.230",
    "key": "value0"
}, {
    "id": "2",
    "sn": "00:1E:C0:8D:9A:CD",
    "log": "{\"date\":\"16/04/2021\",\"time\":\"09h34\",\"t_1\":\"20.8\",\"t_2\":\"20.2\""[truncated 41247 bytes]; line: 920, column: 41769]>}
[2021-07-20T13:26:34,592][WARN ][logstash.filters.split   ][main][50f6c9a973183eae23bbf05db2e85e19524c28d3e8a934f312a4a1e2580f699b] Only String and Array types are splittable. field:someField is of type = NilClass

But I really don't get why :thinking:

Thank you for taking the time to read me.

No way to say what the JSON parser objected to without seeing the unredacted message that you passed to the json filter.

My boss changed the format of the json. Twas hard. So now I'm off to new problems...
Anyway thanks a lot for helping me out !!!