Logstash how to parse and split nested json file

Hi!
I have a nested json file in a list read as single event in Elasticsearch like this :

[{"header": {"id": "idvalue", 
			 "datestamp": "YYYY-MM-DD"
			 }, 
   "metas": {"dc:title": "text...", 
			 "dc:id": "idvalue",
			 "dc:subject": [{"id": "idvalue", "title": "text"}, 
							{"id": "idvalue", "title": "text"}, 
							{"id4": "idvalue", "title": "text..."}, 
							{"id": "idvalue", "title": "text"}
							], 
			 "dc:description": "long text.\r\n\r\n\r\n\r\n\r\n\r\n", 
			 "dc:pub": [{"id": "idvalue", "title": "text..."}
						], 
			 "dc:creation": "YYY-MM-DD:HH:mm:ss", 
			 "dc:modif": "YYY-MM-DD:HH:mm:ss", 
			 "dc:ava": "YYY-MM-DD:HH:mm:ss", 
			 "dc:typ": [{"id": "value", "title": "texte"}
						], 
			 "dc:ext": "HH:MM:SS", 
			 "dc:loc": [{"lat": numvalue, "lng": numvalue}
						], 
			"dc:lic": "text", 
			"dc:rH": [{"id": "value", "title": "text"}
						], 
			"dc:aud": "text"
			}
	}, 
	{"header": {"id": "idvalue", 
			 "datestamp": "YYYY-MM-DD"
			 }, 
   "metas": {"dc:title": "text...", 
			 "dc:id": "idvalue",
			 "dc:subject": [{"id": "idvalue", "title": "text"}, 
							{"id": "idvalue", "title": "text"}, 
							{"id4": "idvalue", "title": "text..."}, 
							{"id": "idvalue", "title": "text"}
							], 
			 "dc:description": "long text.\r\n\r\n\r\n\r\n\r\n\r\n", 
			 "dc:pub": [{"id": "idvalue", "title": "text..."}
						], 
			 "dc:creation": "YYY-MM-DD:HH:mm:ss", 
			 "dc:modif": "YYY-MM-DD:HH:mm:ss", 
			 "dc:ava": "YYY-MM-DD:HH:mm:ss", 
			 "dc:typ": [{"id": "value", "title": "texte"}
						], 
			 "dc:ext": "HH:MM:SS", 
			 "dc:loc": [{"lat": numvalue, "lng": numvalue}
						], 
			"dc:lic": "text", 
			"dc:rH": [{"id": "value", "title": "text"}
						], 
			"dc:aud": "text"
			}
	}, 
	...
]

How can I manage in the filter section (of logstash configuration file) to get an output like this :

   "metas": {"title": "text...", 
			 "id": "idvalue",
			 "subject": {{"id": "idvalue", "title": "text"}, 
							{"id": "idvalue", "title": "text"}, 
							{"id4": "idvalue", "title": "text..."}, 
							{"id": "idvalue", "title": "text"}
							}, 
			 "description": "long text.\r\n\r\n\r\n\r\n\r\n\r\n", 
			 "pub": {{"id": "idvalue", "title": "text..."}
						}, 
			 "creation": "YYY-MM-DD:HH:mm:ss", 
			 "modif": "YYY-MM-DD:HH:mm:ss", 
			 "ava": "YYY-MM-DD:HH:mm:ss", 
			 "typ": {{"id": "value", "title": "texte"}
						}, 
			 "ext": "HH:MM:SS", 
			 "loc": {{"lat": numvalue, "lng": numvalue}
						}, 
			"lic": "text", 
			"rH": {{"id": "value", "title": "text"}
						}, 
			"aud": "text"
			},
	   "metas": {"title": "text...", 
			 "id": "idvalue",
			 "subject": {{"id": "idvalue", "title": "text"}, 
							{"id": "idvalue", "title": "text"}, 
							{"id4": "idvalue", "title": "text..."}, 
							{"id": "idvalue", "title": "text"}
							}, 
			 "description": "long text.\r\n\r\n\r\n\r\n\r\n\r\n", 
			 "pub": {{"id": "idvalue", "title": "text..."}
						}, 
			 "creation": "YYY-MM-DD:HH:mm:ss", 
			 "modif": "YYY-MM-DD:HH:mm:ss", 
			 "ava": "YYY-MM-DD:HH:mm:ss", 
			 "typ": {{"id": "value", "title": "texte"}
						}, 
			 "ext": "HH:MM:SS", 
			 "loc": {{"lat": numvalue, "lng": numvalue}
						}, 
			"lic": "text", 
			"rH": {{"id": "value", "title": "text"}
						}, 
			"aud": "text"
			},
			....

Thanks!

It sounds like you want a split filter to split the array into multiple events, and maybe a mutate+rename to move [metas] to the top-level, and mutate+remove_field to get rid of the [header] field.

Yes, that is what I want, a split filter as you say.
Can you help with an example?
Thank you!

Is the JSON serialized in the [message] field or has it already been parsed? If so, what is the name of the field that contains the array?

It is in the field [message ]

OK, so if your message field looks like

   "message" => "[{\"header\": {\"id\": \"idvalue\", \n\t\t\t \"datestamp\": \"YYYY-MM-DD\"\n\t\t\t }, \n   \"metas\": {\"dc:title\": \"text...\", ... ]"

then you can use

    json { source => "message" target => "json" remove_field => [ "message" ] }
    split { field => "json" }
    mutate { rename => { "[json][metas]" => "metas" } }
    mutate { remove_field => [ "json" ] }

Sorry for the delay, I ran into a technical complication just after my last post.
I tried with your code but it doesn't seem to work with my file.

I can see that "\n\t\t\t" or "\n" is missing before "metas". I don't have the following

"message" => "[{\"header\": {\"id\": \"idvalue\",  \"datestamp\": \"YYYY-MM-DD\"\n\t\t\t }, \n   \"metas\": {\"dc:title\": \"text...\", ... ]"

but this one :

[{\"header\": {\"id\": \"idvalue\",  \"datestamp\": \"YYYY-MM-DD" },  \"metas\": {\"dc:title\": \"text...\", ... ]"

Is it the problem?

Thanks

No, whitespace does not matter.

I still haven't succeeded.
After trying with your code I can see with debug and trace the following:

[WARN ] 2022-05-23 11:20:54.407 [[main]>worker6] json - Error parsing json {:source=>"message", :raw=>"[{\"header\":...", :exception=>#<LogStash::Json::ParserError: Illegal unquoted character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in string value
 at [Source: (byte[])"[{"header": ..."[truncated 138 bytes]; line: 1, column: 625]>}

[WARN ] 2022-05-23 11:20:54.414 [[main]>worker6] split - Only String and Array types are splittable. field:json is of type = NilClass
{
          "tags" => [
        [0] "_jsonparsefailure",
        [1] "_split_type_failure"
    ],
"event" => {
"original" => "[{\"header\": {\"id\": \"idvalue\", \n\t\t\t \"datestamp\": \"YYYY-MM-DD\"\n\t\t\t }, \n   \"metas\": {\"dc:title\": \"text...\", ... 
},
,
          "host" => {
        "name" => "D...."
    },
    "@timestamp" => 2022-...,
           "log" => {
        "file" => {
            "path" => "/simple.txt"
        }
    },

"message" => "[{\"header\": {\"id\": \"idvalue\", \n\t\t\t \"datestamp\": \"YYYY-MM-DD\"\n\t\t\t }, \n   \"metas\": {\"dc:title\": \"text...\", ... ,
      "@version" => "1",
          "type" => "json"
}

Where is the problem?
Thanks

If any question about my original data in the file please see link Reproduction - Pastebin.com for reproduction
Thanks !

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.