I needed a javascript equivalent of unescape
in logstash. I know %uXXXX is not standard but this is still quite common. A lot of our users' browsers send urls in this format. I am reserving the use of ruby
filter in logstash to decode this type of url as the last resort. Before that, I am looking for a less heavy solution. Thanks for any suggestions.
I suppose you've concluded that the urldecode filter doesn't cut it?
No. The urldecode
filter cannot decode %uXXXX
type of encoded urls. With a minimal config like this:
input {
stdin {
}
}
filter{
urldecode {
field => "message"
}
}
output {
stdout {
codec => rubydebug
}
}
Test it with two lines of url-encoded text, which, if correctly decoded, represent the same sequence of chinese characters:
I already figured out a ruby filter shown below to circumvent the limitation:
ruby {
code => "
# urldecode non-standard %uXXXX type of string
['cs_uri_query', 'cs_cookie', 'cs_referer'].each { |field|
if event[field] and event[field].include? '%u'
event[field] = event[field].gsub(/%u([0-9A-F]{4})/i){$1.hex.chr(Encoding::UTF_8)}.strip
end
}
"
}
But I am still looking for easier ways.