User-agent filter does not handle "+" as space in ua-strings from IIS

Hey all, I'm using logstash 2.3 to pull in IIS logs into our little ES cluster running on site. Things are going well (albeit slowly) and now I can start to troll through some of our data I'm seeing a LOT of results from the user-agent filter return as either Other or Generic Smartphone (iPhone and iPad are in the lead).

Looking deeper, the first issue was that the regexes.yml file supplied with logstash is WAAAAY out of date. Replaced that and have (along with others) requested pull's on the github page https://github.com/logstash-plugins/logstash-filter-useragent/pull/15

Second issue is that a lot of the log entries are like this:
2015-12-30 00:00:04 10.131.23.197 POST /handheld/resource/jobsheet/index.cfm - 443 - 1.129.96.46 Mozilla/5.0+(iPhone;+CPU+iPhone+OS+8_1_2+like+Mac+OS+X)+AppleWebKit/600.1.4+(KHTML,+like+Gecko)+Version/8.0+Mobile/12B440+Safari/600.1.4 200 0 0 36419 237

The user-agent string is (for whatever reason) replacing the standard spaces with "+" and it's this (I believe) that's causing the inaccurate matches.

How can we get the logstash-filter-useragent updated for both; latest regexes file AND support "+" as space?

We had a similar issue. We ended up just manually modifying the regexes.yaml file.

So
- regex: '(Windows NT 6.1)'
os_replacement: 'Windows 7'

became:
- regex: '(Windows+NT+6.1)'
os_replacement: 'Windows 7'

It is recommended that you put these into a new file and not edit the default file. Otherwise it might get overwritten when upgrading plugins. We also kept the original value too as the user agent from Apache uses a space.

If someone has a better solution I would love to hear it. This means we can't automatically upgrade to a newer version of the file when new OS's come out.

I came across this issue and used the following in my filter to remove the "+" and add a space. If you improve on this or have other suggestions for IIS logs please share.

mutate {
gsub => [
# replace + with a space " "
"useragent", "[+]", " "
]
}

1 Like

Yea, that's what I ended up doing. Still getting a lot of unknowns so guessing there's more issues.

Looking at somehow integrating calls to this service: http://www.handsetdetection.com/features though that in itself may add too much overhead and delay.

Sorry for the necro, but they've finally released a decent logstash plugin for detailed handset data:

We're using it on all of our IIS logs and now we can split the data out by device type manufacturer, etc, etc. very very rich logs!!