Remove data from json array in Logstash

I have a data source which is sending in data in json format. Some of this data is held in an array, which makes it difficult to visualize. The array only holds one set of data however.

For example, the "docs" array below:

"field_1":"",
"field_2":"",
"docs":[{
"alliance_data_srstrust":["
"],
"alliance_link_srstrust":"https://
",
"alliance_score_srstrust":-100,
"alliance_updated_srstrust":"2014-10-07T00:29:07Z",
"childproc_count":1,"cmdline":"C:\WINDOWS\splwow64.exe 8192",
"comms_ip":"
",
"computer_name":"
",
"crossproc_count":2,
"filemod_count":2,
"group":"********",
"host_type":"workstation",
"hostname":"
",
"id":"0000106f-0000-1aec-01d1-d161694406e2",
"interface_ip":"
",
"last_update":"2016-06-28T17:20:53.101Z",
"modload_count":84,
"netconn_count":0,
"os_type":"windows",
"parent_guid":"0000106f-0000-0e88-01d1-d16166a9fb12",
"parent_md5":"000000000000000000000000000000",
"parent_name":"acrord32.exe",
"parent_pid":3720,
"parent_unique_id":"0000106f-0000-0e88-01d1-d16166a9fb12-00000001",
"path":"c:\windows\splwow64.exe",
"process_guid":"0000106f-0000-1aec-01d1-d161694406e2",
"process_md5":"127AA81343A7C6F665C22CB1293B0A90",
"process_name":"splwow64.exe",
"process_pid":6892,
"regmod_count":9,
"segment_id":1,
"sensor_id":4207,
"start":"2016-06-28T17:20:47.855Z",
"unique_id":"
",
"username":"
**"
}],

I want to be able to remove the name "docs" and the brackets around the "docs" array and be left with a list of fields separated by commas. I tried to do it with this filter but it generates lots of number_format_exception errors in the Logstash logs.

The data still seems to load in Elasticsearch but I don't know if it's missing any.

		if '\"docs\"\:\[' in ["message"] {
			mutate {
				gsub => [ 
				"message", '\,\"docs\"\:\[\{' , '\,' ,
				"message", '\}\]\,' , '\,' 
				]
			}
		}
		json{
			source => "message"
		}

Is there a better way to extract the data from a single value json array?

And what does number_format_exception mean when parsing data in Logstash?

Thank you.

I'm not sure about the array. But if you want to index something in a nested json you just can apply the json filter twice. It have worked for me at least.

json {
source => "message"
}
json {
source => "docs"
}

After this you can remove the doc index with a simple mutate -> remove_field filter.

Thanks for the reply. I thought it had worked at first as I was seeing data come in with no errors but it didn't. I still get this in the logstash log:

{:timestamp=>"2016-06-30T10:21:21.398000-0700", :message=>"Error parsing json", :source=>"docs", :raw=>[{"......"}], :exception=>java.lang.ClassCastException, :level=>:warn}

The event is then tagged with _jsonparsefailure and I don't see the fields from within the array.

I think I have it now. If the array exists, I split it. I also need to convert one of the fields to a string as we have some wonky data in some events (name instead of number) but this works:

	if "**********" in [tags] {
		json {
			source => "message"
		}
		if [docs] {
			split {
				field => "[docs]"
				add_tag => ["splitted_docs"]
			}
			mutate {
				convert => { "watchlist_id" => "string"}
			}
		}
		date {
			match => [ "timestamp","UNIX" ]
			target => "@timestamp"
		}		
	}

Cheers