Problems with parsing JSON file with \n \t and \s


#1

Hi everyone!
I have the following file I need to process in logstash:

{
"DataChangeInfo" : "Archive Log Set archiveLogSet.25933761.25933688 Info:
Thread# Sequence# FirstScn LastScn",
"documentsList" : [
{
  "commandScn": "25933758",
  "commandCommitScn": "0",
  "commandSequence": "3",
  "commandType": "INSERT",
  "commandTimestamp": "2017-12-07 05:09:54+03:000",
  "objectDBName": "DB4",
  "objectSchemaName": "CFTNAXDEV",
  "objectId": "NEWJOURNAL",
  "changedFieldsList": [
    {
      "fieldId": "PK_NEWJOURNAL",
      "fieldType": "NUMBER",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "OFFICE",
      "fieldType": "CHAR",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
},
    {
      "fieldId": "USERNAME",
      "fieldType": "VARCHAR2",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "TERMINALID",
      "fieldType": "VARCHAR2",
      "fieldValue": "dajdajljdaljda,",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "MODULEID",
      "fieldType": "NUMBER",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "VERSION",
      "fieldType": "VARCHAR2",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "DESCRIPTION",
      "fieldType": "VARCHAR2",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "MID",
      "fieldType": "CHAR",
      "fieldValue": "CITIUS33XXX     ",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "STATUS",
      "fieldType": "VARCHAR2",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
{
      "fieldId": "REFERENCE",
      "fieldType": "VARCHAR2",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "ACTIONID1",
      "fieldType": "NUMBER",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "ACTIONID2",
      "fieldType": "NUMBER",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "TIME_STAMP",
      "fieldType": "CHAR",
      "fieldValue": "07-12-2017 23:11:48    ",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "CLASS",
      "fieldType": "CHAR",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "PRICING_WEIGHT",
      "fieldType": "NUMBER",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "FIELD_ID",
      "fieldType": "NUMBER",
      "fieldValue": "4094399774",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "FAULT",
      "fieldType": "CHAR",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "AUDIT_SUBTYPE",
      "fieldType": "VARCHAR2",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "ERROR_SEVERITY",
      "fieldType": "NUMBER",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "HIT_INFO",
      "fieldType": "VARCHAR2",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "ZONE_CODE",
      "fieldType": "VARCHAR2",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "IP_ADDRESS",
      "fieldType": "VARCHAR2",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "ERROR_CODE",
      "fieldType": "NUMBER",
      "fieldValue": "NULL",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "LOG_DATE",
      "fieldType": "TIMESTAMP(6)",
      "fieldValue": "2017-12-07 05:09:54.941+03:000",
      "fieldChanged": "Y"
    },
    {
      "fieldId": "LOG_DATE2",
      "fieldType": "TIMESTAMP(6) WITH TIME ZONE",
      "fieldValue": "2017-12-06 22:09:54.941-05:00",
      "fieldChanged": "Y"
    }
  ],
  "conditionFieldsList": []
}
]
}

I tried a few configurations, lets take this one for example:

input{
  file{
     path => /path/to/file
     start_position => beginning
  }
}
filter{
     json{
        source => "message"
        target => "json_msg"
     }
}
output{
   stdout{codec => rubydebug}
}

The first issue I had to deal with is that in the third line of the data (starting with "Thread..") logstash did not recognized it as a part of the line above, even though its a valid JSON structure. Instead it thinks its another line and the json parser throws a json_parser failure. How to fix it?
I tried using multiline with the pattern ^Thread and it didn't work.

I continued by fixing ^ manualy, putting the two lines together.

When I run logstash I see that it's processing the data one line at a time, instead of as a json file.
like so:

I tried using json_lines as a codec in the input, to no avial.

What is the problem? How can I fix it?
I tried using multiline with the patterns: \s,\s+,\t,\t+,[\s\t].... nothing helped.

I tried to minify the json online, like in this json minifier site, and insert it with stdin (instead of file > path in the input) and everything work fine.
like so:

With is, my question is: is there a way for me to "minify" the json inserted to logstash via the file plugin in the input if the issues above can't be solved?
I need to mention that the logstash logs are fine, no errors are shown.

To sum up my questions:

  1. How can I make logstash understand that line 2 and line 3 (that start with the word "Thread") belong together, as it is in the json structure (there is no "," between the sentences).
  2. How to make logstash parse all the lines together and not one at a time? how can I make it understand that this is a json object? Using json filter (in the filter) and codec (in the input) didn't help, and so did the json_lines codec.
  3. If 2 can't be solved easly, how can I minify the json data in a similar way to the way shown in the json minifier site? that seems to be the only way to make it work.

Any help would be appreciated!


(Guy Boertje) #2

This is a very difficult problem to solve because the file input is line orientated. I don't think the multiline codec will help in this case unless you can see a pattern.

You should consider using Python or similar to JSON load the file and then JSON dump it to another file or send it to a tcp input.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.