Xml plugin getting text with characters

(Nuno Ferreira) #1

Hi,

I'm getting the XML values from a QUEUE and when i get an element that have the '&' character it transforms into the &amp string.

Here are the before and after:

gz$LDQ@vnbkj7DNUVwwjrnGa0&|2,a)

gz$LDQ@vnbkj7DNUVwwjrnGa0&|2,a)

Here are the code of the filter, the field in question is the service_key:

xml{
source => "message"
store_xml => false
remove_namespaces => true
force_array => false
xpath => [
"/LogMessage/TransactionID/text()", "transaction_id",
"/LogMessage/ServiceKey/text()", "service_key"
]
}

Any ideas why this happens and how can i resolve this?

Cheers,

(Nuno Ferreira) #2

Strange the field after passing to the interpreter of the browser, is correct.
The second value instead of a '&' it has the '&amp'.
This field is used after in a jdbc_streaming plugin and it has the wrong value.

#3

I do not see any difference between the before and after. You may need to use appropriate markdown to quote your examples.

(Nuno Ferreira) #4

This is what i get from the XML:

            "service_key" => "gz$LDQ@vnbkj7DNUVwwjrnGa0&|2,a)",

and instead of & it should be '&'

(Walker) #5

You could always setup a mutate pass on the field and use gsub to replace &amp with &

mutate {
  gsub => [
  "service_key", "&amp", "&"
  ]
}
(Nuno Ferreira) #6

Hi,

The problem using that is that i can't predict that's the only character that needs to be treated.
There's no other way of doing something in the XML filter plugin?

(Walker) #7

The source of the problem is the parsing engine the XML plugin uses and it's mistreatment of a special character. You can safely assuming that alphanumeric characters wont have this occur, and based on your current results, $@ are safe as well. All that in mind, unless we get some input from the plugin dev (open a ticket on github?), you could perform a test ingest of each special character and then add to the mutate rule for those that are improperly parsed.

#8

If you have HTML entities in a string then there is a third-party filter that can remove them.

(Nuno Ferreira) #9

The solution wasn't pretty, but solves the problem.

(Walker) #10

What was your solution, a mutate filter?

(Nuno Ferreira) #11

The solution that you provided in the first place, a mutate filter with a lot of characters to be gsubed.

cheers,