Best way to remove CDATA Infos in xml


#1

Hello,

I have an XML with several CDATA Info in my values. I don't want them there. What is the best way to delete this infos? I tried with using filter { mutate { gsub }} but is there an easier solution?

Here is an example of my fields:

CVE_ID: <![CDATA[CVE-2018-13217]]>, <![CDATA[CVE-2018-12498]]>, <![CDATA[CVE-2018-184362]]>
Threat: <![CDATA[Firefox is a free open-scource web browser.]]>

What i want:

CVE_ID: CVE-2018-13217, CVE-2018-12498, CVE-2018-184362
Threat: Firefox is a free open-source web-browser.

My filter for this:

filter{
  ....
    mutate { 
        gsub => [
            "CVE_ID", "<!\[CDATA", "",
            "Threat", "<!\[CDATA", ""
        ]
    }
}

I am pretty sure there is a better solution for my problem, can somone help me?


#2

Well, you could do it using gsub, which would be like

mutate {
    gsub => [
        "CVE_ID", "<!\[CDATA\[", "",
        "Threat", "<!\[CDATA\[", "",
        "CVE_ID", "]]>", "",
        "Threat", "]]>", ""
    ]
}

but you would have to add entries for every field you want to modify. What we need is a way to specify that these gsubs be done on a list of fields. You can do that using a ruby script. Error handling left as an exercise for the reader. In your logstash configuration, add

ruby {
    path => "/home/user/stripCDATA.rb"
    script_params => { "fields" => [ "CVE_ID", "Threat" ] }
}

And in the rb file save this:

def register(params)
  @fields = params['fields']
end

def filter(event)
  @fields.each { |x|
    a = event.get(x)
    if a then
        a = a.gsub("<!\[CDATA\[", "")
        a = a.gsub("]]>", "")
        event.set(x, a)
    end
  }
  [event]
end

(system) closed #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.