Nested Json parse failure in logstash?

Abdul_Gaffar_Shaikh · August 13, 2019, 2:52pm

hi Guys,

I am trying to parse my json into logstash and finally into ES, but giving all try i am failing . Could you please help with it

input {
file{
	path => "${JSON_HOME}/*.json"
	#path => "C:/Users/ashaik13/Dev/software/json/*.json"
	#codec => "json"
	start_position => "beginning"
	sincedb_path => "NUL"
	}
  }

 filter {
json{
source => "message"
 }
 }

output {
 stdout {
 codec => rubydebug 
 }
}

i have tried using codec as json or multiline and gave pattern in input field. but added mutate filter , tried nested access but doesn't work

tried this as well :-

filter {
  ...
  json {
     ...
  }
  mutate {
    add_field => {
      "firstname" => "%{[parsedJson][firstname]}"
      "lastname" => "%{[parsedJson][lastname]}"
    }
  }
}






my json is :- 

{
  "Metadata": {
    "-xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
    "-xsi:noNamespaceSchemaLocation": "XML.xsd",
    "Td": "165",
    "BILL-d": {
      "BILL": [
        {
          "abc": "191560004687",
          "def": "08012018",
          "ghi": "906202553",
          "jkl": "191560004687.08012018.PROD.pdf"
        },
        {
          "abc": "191560002287",
          "def": "06012019",
          "ghi": "909168192",
          "jkl": "191560002287.06012019.PROD.pdf"
        }
     ]
    }
  }
}

Badger · August 13, 2019, 3:12pm

If you JSON really is spread across 23 lines of a file you need to use a multiline codec on the file input to combine it into a single event. If you want to ingest the entire file as a single event then I would use a regexp that never matches and a timeout to flush the event.

codec => multiline { pattern => "^Spalanzani" negate => true what => previous auto_flush_interval => 1 }

Then a json filter to parse it

filter { json { source => "message" remove_field => [ "message" ] } }

This results in

  "Metadata" => {
                                "Td" => "165",
                        "-xmlns:xsi" => "http://www.w3.org/2001/XMLSchema-instance",
                            "BILL-d" => {
        "BILL" => [
            [0] {
                "def" => "08012018",
                "abc" => "191560004687",
                "jkl" => "191560004687.08012018.PROD.pdf",
                "ghi" => "906202553"
            },
            [1] {
                "def" => "06012019",
                "abc" => "191560002287",
                "jkl" => "191560002287.06012019.PROD.pdf",
                "ghi" => "909168192"
            }
        ]
    },
    "-xsi:noNamespaceSchemaLocation" => "XML.xsd"
},

Abdul_Gaffar_Shaikh · August 14, 2019, 10:07am

hey @Badger ,

Thanks , but this is not the output that comes , what i want is the actual records inside BILL Array , and other elements such as TD.

my output is :-

{
      "@version" => "1",
          "tags" => [
        [0] "multiline",
        [1] "_jsonparsefailure"
    ],
       "message" => "        },\r\n        {\r\n          \"abc\": \"191560001434\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"909166998\",\r\n          \"jkl\": \"191560001434.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560003299\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"909168280\",\r\n          \"jkl\": \"191560003299.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560004542\",\r\n          \"def\": \"04012019\",\r\n          \"ghi\": \"908991005\",\r\n          \"jkl\": \"191560004542.04012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560002447\",\r\n          \"def\": \"04012019\",\r\n          \"ghi\": \"905655861\",\r\n          \"jkl\": \"191560002447.04012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560000755\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"909157434\",\r\n          \"jkl\": \"191560000755.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560003727\",\r\n          \"def\": \"05012019\",\r\n          \"ghi\": \"906974776\",\r\n          \"jkl\": \"191560003727.05012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560003754\",\r\n          \"def\": \"05012019\",\r\n          \"ghi\": \"906975285\",\r\n          \"jkl\": \"191560003754.05012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560003845\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"905670004\",\r\n          \"jkl\": \"191560003845.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560003109\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"904872374\",\r\n          \"jkl\": \"191560003109.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560002808\",\r\n          \"def\": \"06012018\",\r\n          \"ghi\": \"901078528\",\r\n          \"jkl\": \"191560002808.06012018.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560001413\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"909166990\",\r\n          \"jkl\": \"191560001413.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560001331\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"900961729\",\r\n          \"jkl\": \"191560001331.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560005028\",\r\n          \"def\": \"03012019\",\r\n          \"ghi\": \"904700106\",\r\n          \"jkl\": \"191560005028.03012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560000446\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"902663966\",\r\n          \"jkl\": \"191560000446.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560001135\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"909166862\",\r\n          \"jkl\": \"191560001135.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560003127\",\r\n          \"def\": \"05012019\",\r\n          \"ghi\": \"909034293\",\r\n          \"jkl\": \"191560003127.05012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560002831\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"905001650\",\r\n          \"jkl\": \"191560002831.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560003780\",\r\n          \"def\": \"04012019\",\r\n          \"ghi\": \"905650427\",\r\n          \"jkl\": \"191560003780.04012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560002622\",\r\n          \"def\": \"03012019\",\r\n          \"ghi\": \"909169346\",\r\n          \"jkl\": \"191560002622.03012019.PROD.pdf\"\r\n        }\r\n      ]\r\n    }\r\n  }\r",
          "host" => "SACWDD0563",
    "@timestamp" => 2019-08-14T10:33:35.476Z,
          "path" => "C:/Users/ashaik13/Dev/software/json/bill4.json"
}

Badger · August 14, 2019, 12:43pm

You JSON starts with }, so it is not valid JSON. You may be able to fix that using mutate.

Abdul_Gaffar_Shaikh · August 14, 2019, 1:38pm

hi @Badger

i have tried this , but still it does not work :-

mutate {
gsub => ["message","\n",""]
gsub => ["message","\r",""]
gsub => ["message","},\r\n",""]

my current config :-

input 
{
    file 
    {
       codec => multiline
        {
            pattern => '^\{'
			#pattern => '^Spalanzani'
            negate => true
            what => previous 
			max_lines => 1073741824
			max_bytes => "3 GB"
			auto_flush_interval => 1
        }
        path => "${JSON_HOME}/*.json"
        start_position => "beginning"       
        sincedb_path => "NUL"
        exclude => "*.gz"
	}
}

filter { 

	mutate {
		gsub => ["message","\n",""]
		gsub => ["message","\r",""]
		gsub => ["message","},\r\n",""]
		
	}
	

	json { 
	source => "[message]"
remove_field => [ "message" ]	
		}
	
}
output {
  
  stdout{
  codec => rubydebug
  
  }
}

Badger · August 14, 2019, 1:54pm

You do not need to worry about \r and \n. I believe the json filter will work around them. However, the leading } has to be removed.

Abdul_Gaffar_Shaikh · August 14, 2019, 2:25pm

hi @Badger,

Note :- i have multiple json files having same pattern in the beginning.
i added this but still the output remains same

mutate {
		gsub => ['message','}','']
	}

Badger · August 14, 2019, 3:38pm

What you have is effectively

    },
    {
      \"abc\": \"191560003780\",
      \"def\": \"04012019\",
      \"ghi\": \"905650427\",
      \"jkl\": \"191560003780.04012019.PROD.pdf\"
    },
    {
      \"abc\": \"191560002622\",
      \"def\": \"03012019\",
      \"ghi\": \"909169346\",
      \"jkl\": \"191560002622.03012019.PROD.pdf\"
    }
  ]
}
}

Which is not valid JSON. Your multiline filter is not capturing a complete JSON object.

Abdul_Gaffar_Shaikh · August 15, 2019, 7:11am

hey @Badger

Note:- i have multiple files and the json snippet shared above is common in all the files.

i have tried allthe patterns

pattern => '^\{'
or
pattern => '^Spalanzani'
or
codec => json
or
codec => json_lines

system · September 12, 2019, 7:11am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Json parse exception on \r in logstash 5.2. for nested json on windows Logstash	5	985	April 12, 2017
Parsing nested JSON Logstash	2	374	June 7, 2018
Nested Json logs - how to filter correctly Logstash	12	4631	July 6, 2017
File input JSON with multiline codec plugin Logstash	7	3259	December 3, 2019
How to parse the multiline and nested json file Logstash	2	695	March 17, 2022

Nested Json parse failure in logstash?

Related topics