Here is the approach I would take:
I’d like to group web requests into page requests by a combination of ‘username’, ‘uri-stem’, and tiny log datetime chops
Let's assume that you can take these values from each request and concatenate/hash them somehow to produce a unique ID for each "Page Request" (rather than web request), I'm calling it prID
.
I would start by creating a scripted field for prID
. Ignore the fields I'm referencing, but you can see that I'm just concatenating the fields with a separator:
To make sure that the script works as expected use it with a terms aggregation in a data table:
Next I'll start a bar chart, create a timeline on the x-axis, and use an average for my metric (I'm averaging bytes
because I don't have a request time field in my data)
Next I change the metric from just average to "Average Bucket".
With the config pictured above Elasticsearch is doing the following for each time bucket:
- create Page Request buckets by grouping web request
prID
- sort the Page Request buckets by
prID
lexicographically
- with the first 100 Page Request buckets
- sum the bytes of each web request inside the bucket
- average the sums to determine the Y value of this bar
This is pretty much what you were looking for, right? One thing you'll definitely want to play with is the size of the buckets that you average. This visualization is looking at 100 within each time bucket, which might be sufficient.
Picking the top 100 Page Requests is another bit you should play with. If your prID
is a hash then sorting lexicographically will probably give you a good sample (I'm not a mathematician). Alternatively you could change Order By: Term
to use a custom metric, like the sum of response times in each bucket. Then the chart would be averaging the longest Page Requests... There are a lot of options in there.
If you want more complete numbers there is another approach you could take. Along with indexing the web requests, index summary documents for each Page Request. Use the prID
as the id of the summary document and write the documents to elasticsearch via the update API, taking advantage of it's upsert behavior to either insert a new doc or increment the total request time of an existing summary doc. You should be able to do this with Logstash, but it is a trade off as it requires more complex logstash configs, more indexing capacity, and more storage.