OK, this one looks a little crazy, but just wanted to check if there is
something on these lines.
We were exploring aggregations as we worked on extracting some usage
patterns on our IIS logs, and one of the things we felt needed was counting
the unique days in which the same user has visited our site over a month.
We fed the data into Elasticsearch through Logstash and started playing
Approach 1: A date histogram aggregation helps us group accesses of people
on different times of a day into one entry, by choosing the interval as
"day". If we use the concept of sub-aggregations, and have a top level terms
aggregation on IP addresses, and under that, a date histogram with
interval as "day", we can get, for every IP, how many times they accessed
out site each day.
Approach 2: Cardinality Aggregations provide us a way to count distinct IP
addresses that have hit our site on each day, as shown in the example
But can we combine the two, and count the distinct days in which a person
came to our site? This would require us to first aggregate on people as
usual, then group on the days, which themselves are integrated from the
time stamps. Is it possible to have a cardinality aggregation on top of a
date histogram aggregation to achieve this? Are there other ways of looking
at this - like using "month" as the interval, but only counting distinct
days and not distinct entries? Or do you suggest dumping the data generated
by Approach 1 to another index and perform cardinality aggregation on that
(if there are not other dynamic approaches available)?
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a9ec0e7e-dff8-41a4-9010-b8a4d3c21bd2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.