Nested Json parse failure in logstash?

hi Guys,

I am trying to parse my json into logstash and finally into ES, but giving all try i am failing . Could you please help with it

input {
file{
	path => "${JSON_HOME}/*.json"
	#path => "C:/Users/ashaik13/Dev/software/json/*.json"
	#codec => "json"
	start_position => "beginning"
	sincedb_path => "NUL"
	}
  }

 filter {
json{
source => "message"
 }
 }

output {
 stdout {
 codec => rubydebug 
 }
}

i have tried using codec as json or multiline and gave pattern in input field. but added mutate filter , tried nested access but doesn't work

tried this as well :-

filter {
  ...
  json {
     ...
  }
  mutate {
    add_field => {
      "firstname" => "%{[parsedJson][firstname]}"
      "lastname" => "%{[parsedJson][lastname]}"
    }
  }
}






my json is :- 

{
  "Metadata": {
    "-xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
    "-xsi:noNamespaceSchemaLocation": "XML.xsd",
    "Td": "165",
    "BILL-d": {
      "BILL": [
        {
          "abc": "191560004687",
          "def": "08012018",
          "ghi": "906202553",
          "jkl": "191560004687.08012018.PROD.pdf"
        },
        {
          "abc": "191560002287",
          "def": "06012019",
          "ghi": "909168192",
          "jkl": "191560002287.06012019.PROD.pdf"
        }
     ]
    }
  }
}

If you JSON really is spread across 23 lines of a file you need to use a multiline codec on the file input to combine it into a single event. If you want to ingest the entire file as a single event then I would use a regexp that never matches and a timeout to flush the event.

codec => multiline { pattern => "^Spalanzani" negate => true what => previous auto_flush_interval => 1 }

Then a json filter to parse it

filter { json { source => "message" remove_field => [ "message" ] } }

This results in

  "Metadata" => {
                                "Td" => "165",
                        "-xmlns:xsi" => "http://www.w3.org/2001/XMLSchema-instance",
                            "BILL-d" => {
        "BILL" => [
            [0] {
                "def" => "08012018",
                "abc" => "191560004687",
                "jkl" => "191560004687.08012018.PROD.pdf",
                "ghi" => "906202553"
            },
            [1] {
                "def" => "06012019",
                "abc" => "191560002287",
                "jkl" => "191560002287.06012019.PROD.pdf",
                "ghi" => "909168192"
            }
        ]
    },
    "-xsi:noNamespaceSchemaLocation" => "XML.xsd"
},

hey @Badger ,

Thanks , but this is not the output that comes , what i want is the actual records inside BILL Array , and other elements such as TD.

my output is :-

{
      "@version" => "1",
          "tags" => [
        [0] "multiline",
        [1] "_jsonparsefailure"
    ],
       "message" => "        },\r\n        {\r\n          \"abc\": \"191560001434\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"909166998\",\r\n          \"jkl\": \"191560001434.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560003299\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"909168280\",\r\n          \"jkl\": \"191560003299.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560004542\",\r\n          \"def\": \"04012019\",\r\n          \"ghi\": \"908991005\",\r\n          \"jkl\": \"191560004542.04012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560002447\",\r\n          \"def\": \"04012019\",\r\n          \"ghi\": \"905655861\",\r\n          \"jkl\": \"191560002447.04012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560000755\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"909157434\",\r\n          \"jkl\": \"191560000755.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560003727\",\r\n          \"def\": \"05012019\",\r\n          \"ghi\": \"906974776\",\r\n          \"jkl\": \"191560003727.05012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560003754\",\r\n          \"def\": \"05012019\",\r\n          \"ghi\": \"906975285\",\r\n          \"jkl\": \"191560003754.05012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560003845\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"905670004\",\r\n          \"jkl\": \"191560003845.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560003109\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"904872374\",\r\n          \"jkl\": \"191560003109.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560002808\",\r\n          \"def\": \"06012018\",\r\n          \"ghi\": \"901078528\",\r\n          \"jkl\": \"191560002808.06012018.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560001413\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"909166990\",\r\n          \"jkl\": \"191560001413.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560001331\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"900961729\",\r\n          \"jkl\": \"191560001331.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560005028\",\r\n          \"def\": \"03012019\",\r\n          \"ghi\": \"904700106\",\r\n          \"jkl\": \"191560005028.03012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560000446\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"902663966\",\r\n          \"jkl\": \"191560000446.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560001135\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"909166862\",\r\n          \"jkl\": \"191560001135.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560003127\",\r\n          \"def\": \"05012019\",\r\n          \"ghi\": \"909034293\",\r\n          \"jkl\": \"191560003127.05012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560002831\",\r\n          \"def\": \"06012019\",\r\n          \"ghi\": \"905001650\",\r\n          \"jkl\": \"191560002831.06012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560003780\",\r\n          \"def\": \"04012019\",\r\n          \"ghi\": \"905650427\",\r\n          \"jkl\": \"191560003780.04012019.PROD.pdf\"\r\n        },\r\n        {\r\n          \"abc\": \"191560002622\",\r\n          \"def\": \"03012019\",\r\n          \"ghi\": \"909169346\",\r\n          \"jkl\": \"191560002622.03012019.PROD.pdf\"\r\n        }\r\n      ]\r\n    }\r\n  }\r",
          "host" => "SACWDD0563",
    "@timestamp" => 2019-08-14T10:33:35.476Z,
          "path" => "C:/Users/ashaik13/Dev/software/json/bill4.json"
}

You JSON starts with }, so it is not valid JSON. You may be able to fix that using mutate.

hi @Badger

i have tried this , but still it does not work :-

mutate {
gsub => ["message","\n",""]
gsub => ["message","\r",""]
gsub => ["message","},\r\n",""]

}

my current config :-

input 
{
    file 
    {
       codec => multiline
        {
            pattern => '^\{'
			#pattern => '^Spalanzani'
            negate => true
            what => previous 
			max_lines => 1073741824
			max_bytes => "3 GB"
			auto_flush_interval => 1
        }
        path => "${JSON_HOME}/*.json"
        start_position => "beginning"       
        sincedb_path => "NUL"
        exclude => "*.gz"
	}
}

filter { 

	mutate {
		gsub => ["message","\n",""]
		gsub => ["message","\r",""]
		gsub => ["message","},\r\n",""]
		
	}
	

	json { 
	source => "[message]"
remove_field => [ "message" ]	
		}
	
}
output {
  
  stdout{
  codec => rubydebug
  
  }
}

You do not need to worry about \r and \n. I believe the json filter will work around them. However, the leading } has to be removed.

hi @Badger,

Note :- i have multiple json files having same pattern in the beginning.
i added this but still the output remains same

mutate {
		gsub => ['message','}','']
	}

What you have is effectively

    },
    {
      \"abc\": \"191560003780\",
      \"def\": \"04012019\",
      \"ghi\": \"905650427\",
      \"jkl\": \"191560003780.04012019.PROD.pdf\"
    },
    {
      \"abc\": \"191560002622\",
      \"def\": \"03012019\",
      \"ghi\": \"909169346\",
      \"jkl\": \"191560002622.03012019.PROD.pdf\"
    }
  ]
}
}

Which is not valid JSON. Your multiline filter is not capturing a complete JSON object.

hey @Badger

Note:- i have multiple files and the json snippet shared above is common in all the files.

i have tried allthe patterns

pattern => '^\{'
or
pattern => '^Spalanzani'
or
codec => json
or
codec => json_lines