Parse multi line json

Hi,

When I try to parse multi line json as below, I am seeing in ELK as multiline, _jsonparsefailure, _grokparsefailure

from below json file content, want to remove KEY4 & host and send rest of the fields to elastic search.

[
{
"KEY1": "ABC",
"KEY2": "ABC",
"KEY3": "ABC",
"KEY4": "{"region":"11","UserSessionId":"222","UserId":"gllexie"}",
"host": "ABC",
"timestamp": 1595411041516,
},
{
"KEY1": "ABC2",
"KEY2": "ABC2",
"KEY3": "ABC2",
"KEY4": "{"region":"22","UserSessionId":"No%20CPM%20Profile","UserId":"gllexie"}",
"host": "ABC2",
"timestamp": 1595411041516,
}
]

Not sure what your current input/filter look like but it would be something like this.

    filter {
      mutate {
        remove_field => [ "host", " KEY4"]
      }
    }

filter {
mutate {
remove_field => ["host", " KEY4"]
}
}

this filter worked, but I see JSON element is in message where as I wanted them to be separate fields in kibana. Can you please help me in resolving this?

Try this. If it doesn't work can you post your full config?

    filter {
      json {
        source => "message"
        remove_field => [ "host", " KEY4"]
      }
    }

input {
file
{
codec => multiline
{
negate => true
what => previous
negate => true
pattern => "{*}"
}
path => ["C:/samplefile.json"]
start_position => "beginning"
sincedb_path => "C:/logs/dev"
}
}
filter {
mutate {
remove_field => ["host", " KEY4"]
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "sample-ingest"
}
stdout {
codec => rubydebug
}
}

When I use above config, I see all the JSON is sent to kibana as message, where as I want each field in JSON as a separate property in kibana.

If you do the below does it work? Is there a reason you are using multiline?

input {
 file
 { 
   path => ["C:/samplefile.json"]
   start_position => "beginning"
   sincedb_path => "C:/logs/dev"
 }
}

Hi, I am using multiline as my JSON object is spreaded across multiple lines.

Hi there,

first of all please make use of the code formatter tool (image ) when pasting non plain text (such as pipeline conf file lines) cause otherwise it'll be more difficult to read and go through.

Now, to my understanding you'd like to process that array of json of yours from a file and send each json as a separate document, each one with its fields extracted.

Now, first thing I suggest you should do is edit your source file (if possible) to have the whole json on a single line and be careful to leave a empty line at the end of the file. So your file should look something like this:

Also, please note that the one you posted is not a valid json. In fact, pasting it in any json validator, it'll highlight the useless commas after the "timestamp" values and the quotes around the nested KEY4 value. To be a valid json, yours should look something like this:

[
  {
  "KEY1": "ABC",
  "KEY2": "ABC",
  "KEY3": "ABC",
  "KEY4": {
    "region":"11",
    "UserSessionId":"222",
    "UserId":"gllexie"
  },
  "host": "ABC",
  "timestamp": 1595411041516
  },
{
  "KEY1": "ABC2",
  "KEY2": "ABC2",
  "KEY3": "ABC2",
  "KEY4": {
    "region":"22",
    "UserSessionId":"No%20CPM%20Profile",
    "UserId":"gllexie"
  },
  "host": "ABC2",
  "timestamp": 1595411041516
  }
]

Now, having said that, what you could do (after you managed to have your json shrinked in one line with a trailing empty line in the file) is a pipeline like the following:

input {
  file {
    path => "path/to/json/file"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => "json"
  }
}

filter {
  mutate {
    remove_field => ["host", "KEY4"]
  }
}

output {
  stdout{}
}

Having your source json file on one line will avoid you that multiline and all the hassles that come with it.

Obviously you can replace the standard output with your ES instance.

1 Like

Hi Fabio,

Thank you for your response, I need to process KEY4 as string as we have limit on the length of the value of KEY4 so in some objects, KEY4 is not a valid JSON (as it is trimmed once it reaches the limit of length).

so, what I am trying to do is remove the KEY4 or convert it as string.

Ok but you need to pass it as a valid json. It means that if you want to treat it as a string, you have to escape the quotes inside the KEY4 value, like so:

{
  "KEY1": "ABC",
  "KEY2": "ABC",
  "KEY3": "ABC",
  "KEY4": "{ \"region\":\"11\", \"UserSessionId\":\"222\", \"UserId\":\"gllexie\" }",
  "host": "ABC",
  "timestamp": 1595411041516
}

Now this is a valid json and you can use that same approach.
Obviously you need to be able to either format the json file properly when writing it (so escaping the quotes in the KEY4 value and put the json as a single line) or to edit it before parsing it with logstash.

Can you do that? Otherwise I have to provide you with a slightly less intuitive solution.

Hi Fabio,

Yes, I can format the KEY4 value as { \"region\":\"11\", \"UserSessionId\":\"222\", \"UserId\":\"gllexie\" },

but for some elements it can be { \"region\":\"11\", \"UserSessionId\":\"222\", \"UserId\":\"gllexieaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\",

Ok so you're telling me some elements might have the value of that field excessively long and the string might not be closed properly with the trailing quotes?

If that's the case I have a feeling you need to first grok out the useless KEY4 part and then proceed with the JSON evaluation, otherwise the json won't be a valid one.

Also, in case of an excessively long KEY4 value, will you miss the remaining part of the json, too (like host and timestamp)?

Can you please post here an example of such a case?

Thank you

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.