Logstash xml filter ruby cleanup code help

I'm using the xml logstash filter to parse xml values into json, but I'm having issues indexing them into Elasticsearch due to an issue where a null string gets parsed into an empty object.

Source String Snip:

  <Actions Context="Author">
    <Exec>
      <Command>%localappdata%\Microsoft\OneDrive\OneDriveStandaloneUpdater.exe</Command>
      <Arguments />
    </Exec>
  </Actions>

Argument here is a string when populated but the xml filter plugin treats this an an empty object. When sending this data to elastic search the ignore_malformed index mapping setting does not work to ignore objects when the type expected is a string / keyword.

My current work around is to strip out nested empty objects from my parsed xml with some ruby code. I can't really figure out how to make this a function that can be called to loop through all the key value pairs in the hash so if anyone has some advice on how to better write this code, it would be greatly appreciated.

Ruby Code:

          xml = event.get("[winlog][event_data][TaskContentXml]")
          key_path = "[winlog][event_data][TaskContentXml]"
          
          xml.each do |key,value|
            key_path_1 = key_path + "[" + key + "]"
            ## loop 1
            if !value.nil? && value.is_a?(Hash)
              if value.empty?
                logger.warn("TaskContentXml1: Empty Hash value at key: " + key_path_1)
                event.remove(key_path_1)
              else
                ## loop 2
                value.each do |key,value|
                  key_path_2 = key_path_1 + "[" + key + "]"
                  if !value.nil? && value.is_a?(Hash)
                    if value.empty?
                      logger.warn("TaskContentXml2: Empty Hash value at key: " + key_path_2)
                      event.remove(key_path_2)
                    else
                      ## loop 3
                      value.each do |key,value|
                        key_path_3 = key_path_2 + "[" + key + "]"
                        if !value.nil? && value.is_a?(Hash)
                          if value.empty?
                            logger.warn("TaskContentXml3: Empty Hash value at key: " + key_path_3)
                            event.remove(key_path_3)
                          else
                            ## loop 4
                            value.each do |key,value|
                              key_path_4 = key_path_3 + "[" + key + "]"
                              if !value.nil? && value.is_a?(Hash)
                                if value.empty?
                                  logger.warn("TaskContentXml4: Empty Hash value at key: " + key_path_4)
                                  event.remove(key_path_4)
                                else
                                  ## loop again
                                end
                              end
                            end
                          end
                        end
                      end
                    end
                  end
                end
              end
            end
          end

Related issue: Invalid mapping case not handled by index.mapping.ignore_malformed · Issue #12366 · elastic/elasticsearch · GitHub

Take a look at this which recursively descends into hashes and arrays in the parsed XML and deletes items that match a condition (in that case based on the length of the element name, but you can change that to be based on the value).

Thanks for the reply.
This looks really close to what I'd need, but I'm not too familiar with ruby so when I see this code block:

event.to_hash.each { |k, v|
removeBigThings(v, "[#{k}]", event)
}

I don't really know what this does. I know k, v are just the key value pairs on the object that are going to be iterated over. but why is event in there and also why are you passing [#{k}] into the function vs just k ?

event is there because inside the function I call event.remove. If it were not passed to the function then event would not be in scope.

I use "[#{k}]" so that a field called "foo" results in the value "[foo]". Inside the function I use "#{name}[#{k}]". If [foo] is a hash with a field called "bar" this will result in the recursive call being passed "[foo][bar]"

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.