How to urldecode %uXXXX type of strings?


(Foresightyj) #1

I needed a javascript equivalent of unescape in logstash. I know %uXXXX is not standard but this is still quite common. A lot of our users' browsers send urls in this format. I am reserving the use of ruby filter in logstash to decode this type of url as the last resort. Before that, I am looking for a less heavy solution. Thanks for any suggestions.


Filters in logstash for ELK
(Magnus B├Ąck) #2

I suppose you've concluded that the urldecode filter doesn't cut it?


(Foresightyj) #3

No. The urldecode filter cannot decode %uXXXX type of encoded urls. With a minimal config like this:

input {
    stdin {
    }
}

filter{
    urldecode {
        field => "message"
    }
}

output {
    stdout {
        codec => rubydebug
    }
}

Test it with two lines of url-encoded text, which, if correctly decoded, represent the same sequence of chinese characters:

I already figured out a ruby filter shown below to circumvent the limitation:

ruby {
    code => "
        # urldecode non-standard %uXXXX type of string
        ['cs_uri_query', 'cs_cookie', 'cs_referer'].each { |field|
            if event[field] and event[field].include? '%u'
                event[field] = event[field].gsub(/%u([0-9A-F]{4})/i){$1.hex.chr(Encoding::UTF_8)}.strip
            end
        }
    "
}

But I am still looking for easier ways.


(system) #4