IIS Log Data - Aggregating multiple web requests up to individual page requests


(Patrick Stevens) #1

Hi all, I'm working with a web application where each page request by a user can result in 5+ web requests (each with its own line in a IIS log) to load all the components of the page. Therefore to get the actual page load times for the application, all the individual web requests must be "grouped", and the 'time taken' field aggregated in order to accurately represent the time taken from the user perspective.

I understand Kibana and Elasticsearch do not supporting 'Grouping' in the classic SQL sense. How have you all gone about analyzing IIS log data like this? Page response times, number of hits, etc. all depend upon accurately identifying true page requests. I'd like to group web requests into page requests by a combination of 'username', 'uri-stem', and tiny log datetime chops.

I am working with Kibana and ES 5.5.1, and IIS w3c formatted logs off of Windows Server 2012.


(Spencer Alger) #2

Here is the approach I would take:

I’d like to group web requests into page requests by a combination of ‘username’, ‘uri-stem’, and tiny log datetime chops

Let's assume that you can take these values from each request and concatenate/hash them somehow to produce a unique ID for each "Page Request" (rather than web request), I'm calling it prID.

I would start by creating a scripted field for prID. Ignore the fields I'm referencing, but you can see that I'm just concatenating the fields with a separator:

To make sure that the script works as expected use it with a terms aggregation in a data table:

Next I'll start a bar chart, create a timeline on the x-axis, and use an average for my metric (I'm averaging bytes because I don't have a request time field in my data)

Next I change the metric from just average to "Average Bucket".

With the config pictured above Elasticsearch is doing the following for each time bucket:

  1. create Page Request buckets by grouping web request prID
  2. sort the Page Request buckets by prID lexicographically
  3. with the first 100 Page Request buckets
    1. sum the bytes of each web request inside the bucket
    2. average the sums to determine the Y value of this bar

This is pretty much what you were looking for, right? One thing you'll definitely want to play with is the size of the buckets that you average. This visualization is looking at 100 within each time bucket, which might be sufficient.

Picking the top 100 Page Requests is another bit you should play with. If your prID is a hash then sorting lexicographically will probably give you a good sample (I'm not a mathematician). Alternatively you could change Order By: Term to use a custom metric, like the sum of response times in each bucket. Then the chart would be averaging the longest Page Requests... There are a lot of options in there.


If you want more complete numbers there is another approach you could take. Along with indexing the web requests, index summary documents for each Page Request. Use the prID as the id of the summary document and write the documents to elasticsearch via the update API, taking advantage of it's upsert behavior to either insert a new doc or increment the total request time of an existing summary doc. You should be able to do this with Logstash, but it is a trade off as it requires more complex logstash configs, more indexing capacity, and more storage.


(Patrick Stevens) #3

Great explanation, thank you!


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.