We have a heavily nested json document containing server metrcs, the document contains > 1000 fields some of which are completely irrelevant to us for analytic purposes so i would like to remove them before indexing the document in Elastic.
However i am unable to find the correct filter to use as the fields i want to remove have common names in multiple different objects within the document.
The source document looks like this ( reduced in size)
[
{
"server": {
"is_master": true,
"name": "MYServer",
"id": 2111
},
"metrics": {
"Server": {
"time": {
"boundary": {},
"type": "TEXT",
"display_name": "Time",
"value": "2018-11-01 14:57:52"
}
},
"Mem_OldGen": {
"used": {
"boundary": {},
"display_name": "Used(mb)",
"value": 687
},
"committed": {
"boundary": {},
"display_name": "Committed(mb)",
"value": 7116
}
"cpu_count": {
"boundary": {},
"display_name": "Cores",
"value": 4
}
}
}
}
]
I wish to remove the "display_name", "boundary" and possible "type" fields from all of the objects.
I have started with a simple enough rubyfilter to loop the json
filter {
split{
field => "message"
}
ruby {
code => '
event.get("[metrics][Mem_OldGen][used]").to_hash.keys.each { |k|
logger.info("field is:", k)
if k.include?("display_name")
event.remove(k)
end
if k.include?("boundary")
event.remove(k)
end
}
'
}
}
It first splits the input into one event per server, then it should loop and remove the fields. However i cannot see how to made this sufficiently generic, or to get it to actually remove the field.
EDIT: ELK version 5.6.8