How to urldecode %uXXXX type of strings?

foresightyj · August 20, 2015, 1:35am

I needed a javascript equivalent of unescape in logstash. I know %uXXXX is not standard but this is still quite common. A lot of our users' browsers send urls in this format. I am reserving the use of ruby filter in logstash to decode this type of url as the last resort. Before that, I am looking for a less heavy solution. Thanks for any suggestions.

magnusbaeck · August 20, 2015, 3:49am

I suppose you've concluded that the urldecode filter doesn't cut it?

foresightyj · August 20, 2015, 5:26am

No. The urldecode filter cannot decode %uXXXX type of encoded urls. With a minimal config like this:

input {
    stdin {
    }
}

filter{
    urldecode {
        field => "message"
    }
}

output {
    stdout {
        codec => rubydebug
    }
}

Test it with two lines of url-encoded text, which, if correctly decoded, represent the same sequence of chinese characters:

I already figured out a ruby filter shown below to circumvent the limitation:

ruby {
    code => "
        # urldecode non-standard %uXXXX type of string
        ['cs_uri_query', 'cs_cookie', 'cs_referer'].each { |field|
            if event[field] and event[field].include? '%u'
                event[field] = event[field].gsub(/%u([0-9A-F]{4})/i){$1.hex.chr(Encoding::UTF_8)}.strip
            end
        }
    "
}

But I am still looking for easier ways.

Topic		Replies	Views
Urldecode error Logstash	2	637	February 26, 2019
Mutate>Convert and URL Decode Logstash	1	455	October 7, 2019
Ruby logstash filter Logstash	2	360	December 31, 2022
Decode ascii hex from string or Convert Charset of a Field on the fly (filters) Logstash	2	2811	July 6, 2017
Urldecode plugin for ingest node? Elasticsearch	2	523	July 5, 2017

How to urldecode %uXXXX type of strings?

Related topics