Remove fields from nested json


(Pete Barnes) #1

We have a heavily nested json document containing server metrcs, the document contains > 1000 fields some of which are completely irrelevant to us for analytic purposes so i would like to remove them before indexing the document in Elastic.
However i am unable to find the correct filter to use as the fields i want to remove have common names in multiple different objects within the document.

The source document looks like this ( reduced in size)

[
    {
        "server": {
            "is_master": true,
            "name": "MYServer",
            "id": 2111
        },
        "metrics": {
            "Server": {
                "time": {
                    "boundary": {},
                    "type": "TEXT",
                    "display_name": "Time",
                    "value": "2018-11-01 14:57:52"
                }
             },
            "Mem_OldGen": {
                "used": {
                    "boundary": {},
                    "display_name": "Used(mb)",
                    "value": 687
                },
                "committed": {
                    "boundary": {},
                    "display_name": "Committed(mb)",
                    "value": 7116
                }
                "cpu_count": {
                    "boundary": {},
                    "display_name": "Cores",
                    "value": 4
                }
            }
         }
      }
]

I wish to remove the "display_name", "boundary" and possible "type" fields from all of the objects.
I have started with a simple enough rubyfilter to loop the json

filter {
      split{
    field => "message"
  }
    ruby {
        code => '
            event.get("[metrics][Mem_OldGen][used]").to_hash.keys.each { |k|
                logger.info("field is:", k)

                if k.include?("display_name")
                    event.remove(k)
                end
                if k.include?("boundary")
                    event.remove(k) 
                end
            }
        '
  }

}

It first splits the input into one event per server, then it should loop and remove the fields. However i cannot see how to made this sufficiently generic, or to get it to actually remove the field.

EDIT: ELK version 5.6.8


(Guy Boertje) #2

FWIW the ruby filter has an init setting which allows you to do some initialisation.
Set Ruby Constant arrays with the paths to objects in one and the fields to remove in the other.

This may work (untested):

  ruby {
    init => '
      # avoid clashing with other constants
      BARNES_PATHS = ["[metrics][Mem_OldGen][used]", "[metrics][Mem_OldGen][committed]"]
      BARNES_FIELDS = ["display_name", "boundary"]
    '
    code => '
      BARNES_PATHS.each do |path|
        hash = event.get(path)
        hash.keys.each do |field|
          logger.info("field is:", field)
          if BARNES_FIELDS.include?(field)
            # must build full path to field 
            event.remove(path + "[" + field + "]")
          end
        end
      end
    '
  }

(Pete Barnes) #3

This worked correctly and removed the fields from the hardcoded paths. Based on this i am trying to make it more generic to avoid having the list all of the metric paths

ruby {
  init => '    
    BARNES_FIELDS = ["display_name", "boundary"]
  '
  code => '
      METRIC_PATHS = event.get("[metrics]").keys
        METRIC_PATHS.each do |path|
            logger.info("Processing path:", path)
        
            event.get("[metrics][" + path + "]").keys.each do |metric|
                logger.info("Processing metric", metric)
                field_hash = event.get("[metrics][" + metric + "]")
                field_hash.keys.each do |field|
                    logger.info("Processing field:", field)
                    if BARNES_FIELDS.include?(field)
                        field_path = "[metrics][" + metric + "][" + field + "]"
                        event.remove(field_path)
                    end
                end
            end
        end
    '
}

The above is what i came up with, it should generate the paths based on the keys present in event.get("[metrics]").
However its giving an error

Ruby exception occurred: undefined methodkeys' for nil:NilClass`

Given there is no log messages present i can only assume this is happening on the call to create METRIC_PATHS, but i do not see why.