Count and group by the results of top_hits aggregation


(Tomás Law) #1

Hello,

I have logs with the fields:
. IP
. Timestamp
. URI
. SessionId

Each log shows the time (timestamp) the URI was visited. The SessionId is a field I created by concatenating IP with Timestamp (without the minutes or seconds). That is, all the logs belonging to the same day and hour are grouped into the same SessionId.

For e.g.

IP Timestamp URI SessionId
1.2.3.4 December25th2017T08:30:00 google.com 1.2.3.4_25-12-2017_8
1.2.3.4 December25th2017T08:45:00 youtube.com 1.2.3.4_25-12-2017_8
1.2.3.4 December25th2017T09:30:00 facebook.com 1.2.3.4_25-12-2017_9
1.2.3.4 December25th2017T09:58:00 youtube.com 1.2.3.4_25-12-2017_9
4.3.2.1 December30th2017T15:20:00 yahoo.com 4.3.2.1_30-12-2017_15
4.3.2.1 December30th2017T15:30:00 elastic.com 4.3.2.1_30-12-2017_15
4.3.2.1 December30th2017T15:58:00 youtube.com 4.3.2.1_30-12-2017_15
5.5.5.5 December30th2017T09.35:00 bing.com 5.5.5.5_30-12-2017_9
5.5.5.5 December30th2017T09.59:00 google.com 5.5.5.5_30-12-2017_9

Now, I need to know the top URI that were the last ones to be visited in the sessions.
That is, for the example above:
-> "youtube.com" was the last URI visited in the following sessions:
. 1.2.3.4_25-12-2017_8
. 1.2.3.4_25-12-2017_9
. 4.3.2.1_30-12-2017_15

-> "google.com" was the last URI visited in the following session:
5.5.5.5_30-12-2017_9

Therefore, I need a visualization/query that indicates something like:
URI Count
youtube.com 3
google.com 1

So far, I've managed to get the last visited URL's for each session by doing a terms aggregation on session_id and then a top_hits aggregation on timestamp.
The problem is that I can't group them by URL (to compute the count for each URL) because top_hits aggregation doesn't allow any subaggregations...

How can I solve this?

Thank you


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.