Remove fields from nested json

Pete_Barnes · November 8, 2018, 1:01pm

We have a heavily nested json document containing server metrcs, the document contains > 1000 fields some of which are completely irrelevant to us for analytic purposes so i would like to remove them before indexing the document in Elastic.
However i am unable to find the correct filter to use as the fields i want to remove have common names in multiple different objects within the document.

The source document looks like this ( reduced in size)

[
    {
        "server": {
            "is_master": true,
            "name": "MYServer",
            "id": 2111
        },
        "metrics": {
            "Server": {
                "time": {
                    "boundary": {},
                    "type": "TEXT",
                    "display_name": "Time",
                    "value": "2018-11-01 14:57:52"
                }
             },
            "Mem_OldGen": {
                "used": {
                    "boundary": {},
                    "display_name": "Used(mb)",
                    "value": 687
                },
                "committed": {
                    "boundary": {},
                    "display_name": "Committed(mb)",
                    "value": 7116
                }
                "cpu_count": {
                    "boundary": {},
                    "display_name": "Cores",
                    "value": 4
                }
            }
         }
      }
]

I wish to remove the "display_name", "boundary" and possible "type" fields from all of the objects.
I have started with a simple enough rubyfilter to loop the json

filter {
      split{
    field => "message"
  }
    ruby {
        code => '
            event.get("[metrics][Mem_OldGen][used]").to_hash.keys.each { |k|
                logger.info("field is:", k)

                if k.include?("display_name")
                    event.remove(k)
                end
                if k.include?("boundary")
                    event.remove(k) 
                end
            }
        '
  }

}

It first splits the input into one event per server, then it should loop and remove the fields. However i cannot see how to made this sufficiently generic, or to get it to actually remove the field.

EDIT: ELK version 5.6.8

guyboertje · November 16, 2018, 4:50pm

FWIW the ruby filter has an init setting which allows you to do some initialisation.
Set Ruby Constant arrays with the paths to objects in one and the fields to remove in the other.

This may work (untested):

  ruby {
    init => '
      # avoid clashing with other constants
      BARNES_PATHS = ["[metrics][Mem_OldGen][used]", "[metrics][Mem_OldGen][committed]"]
      BARNES_FIELDS = ["display_name", "boundary"]
    '
    code => '
      BARNES_PATHS.each do |path|
        hash = event.get(path)
        hash.keys.each do |field|
          logger.info("field is:", field)
          if BARNES_FIELDS.include?(field)
            # must build full path to field 
            event.remove(path + "[" + field + "]")
          end
        end
      end
    '
  }

Pete_Barnes · November 19, 2018, 12:47pm

This worked correctly and removed the fields from the hardcoded paths. Based on this i am trying to make it more generic to avoid having the list all of the metric paths

ruby {
  init => '    
    BARNES_FIELDS = ["display_name", "boundary"]
  '
  code => '
      METRIC_PATHS = event.get("[metrics]").keys
        METRIC_PATHS.each do |path|
            logger.info("Processing path:", path)
        
            event.get("[metrics][" + path + "]").keys.each do |metric|
                logger.info("Processing metric", metric)
                field_hash = event.get("[metrics][" + metric + "]")
                field_hash.keys.each do |field|
                    logger.info("Processing field:", field)
                    if BARNES_FIELDS.include?(field)
                        field_path = "[metrics][" + metric + "][" + field + "]"
                        event.remove(field_path)
                    end
                end
            end
        end
    '
}

The above is what i came up with, it should generate the paths based on the keys present in event.get("[metrics]").
However its giving an error

Ruby exception occurred: undefined methodkeys' for nil:NilClass`

Given there is no log messages present i can only assume this is happening on the call to create METRIC_PATHS, but i do not see why.

system · December 17, 2018, 12:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to remove fields with regex from json? Logstash	6	2648	January 16, 2018
Removing parent fields and keeping nested fields of JSON Logstash	3	1531	August 2, 2017
Remove some fields from jsion Logstash	7	355	August 25, 2018
Json filter and removing json fields Logstash	2	9091	July 6, 2017
Remove json object from nested log Logstash	1	456	March 14, 2023

Remove fields from nested json

Related topics