Querying unique pageviews in log data


(Matthew Boynes) #1

Let's say that I'm using Elasticsearch to host and query log data, and I
want to do some basic analysis of it like I would in Google Analytics. One
of the most basic differentiations I'm having trouble with is pageviews vs.
unique pageviews. I'm sure there's a very simple solution, but for some
reason it's escaping me.

Here's a sample mapping:

{
"log-data": {
"properties": {
"date": {
"type": "date",
"format": "dateOptionalTime"
},
"uid": {
"type": "string"
},
"url": {
"type": "string"
}
}
}
}

"date" and "url" are pretty obvious; "uid" is a unique identifier.

If I want to get the total pageviews for the top 10 URLs for the past week,
I can query it like so:

{
"query": {
"range": {
"date": {
"gte" : "now-1w"
}
}
},
"facets": {
"pageview": {
"terms": {
"field": "url",
"size": 10
}
}
},
"size": 0
}

What I'm struggling with is bringing the "uid" into the query to get unique
pageviews. Any help would be greatly appreciated!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/215897fd-4639-4b90-8308-80318fd2a283%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Brian Yoder) #2

Matthew,

I don't know if this is simple (though it was easy enough for me in Java),
or even if it's exactly what you had in mind. But it sounds as if you are
asking for a hierarchical combination to include the top URLs by uid. Is
that correct?

If so, perhaps thishttps://groups.google.com/d/msg/elasticsearch/_oMbAnpjSGg/II4Tzf6RoSwJwill give you some ideas.

Hope this helps! Good luck!

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/abc89c1b-8fbc-4935-b2b5-135039ead6eb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Matthew Boynes) #3

Hey Brian,
Thanks for taking the time to respond. I looked at the post you suggested
and I don't think that would give me what I need. I believe that what
you're suggesting would provide me with the most views of a single url by a
unique visitor. In other words, if a url has 1004 entries, where one person
viewed it a thousand times and 4 people each viewed it once, the crazy
person who kept refreshing would be at the top of the facet response. I
need to know that the url had 5 unique visitors. If there was only one url
in the index this would work in a roundabout way, because I could look at
the total number of terms returned in the facet. Unfortunately, that's just
not the case here.

Of course, if I'm misunderstanding what your post suggests, or if I've
missed something, please let me know!

Thanks,
Matt

On Friday, January 10, 2014 5:54:46 PM UTC-5, InquiringMind wrote:

Matthew,

I don't know if this is simple (though it was easy enough for me in Java),
or even if it's exactly what you had in mind. But it sounds as if you are
asking for a hierarchical combination to include the top URLs by uid. Is
that correct?

If so, perhaps thishttps://groups.google.com/d/msg/elasticsearch/_oMbAnpjSGg/II4Tzf6RoSwJwill give you some ideas.

Hope this helps! Good luck!

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3e7ef115-1b27-4e67-b6dd-185b84f9c76d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #4

Sounds like what you are looking for is field collapsing which is not yet
supported in elasticsearch. ETA is post 1.0 release. Perhaps there is a way
with the new aggregations framework, but I have yet to try it out.

Cheers,

Ivan

On Fri, Jan 10, 2014 at 3:13 PM, Matthew Boynes <
mboynes@alleyinteractive.com> wrote:

Hey Brian,
Thanks for taking the time to respond. I looked at the post you suggested
and I don't think that would give me what I need. I believe that what
you're suggesting would provide me with the most views of a single url by a
unique visitor. In other words, if a url has 1004 entries, where one person
viewed it a thousand times and 4 people each viewed it once, the crazy
person who kept refreshing would be at the top of the facet response. I
need to know that the url had 5 unique visitors. If there was only one url
in the index this would work in a roundabout way, because I could look at
the total number of terms returned in the facet. Unfortunately, that's just
not the case here.

Of course, if I'm misunderstanding what your post suggests, or if I've
missed something, please let me know!

Thanks,
Matt

On Friday, January 10, 2014 5:54:46 PM UTC-5, InquiringMind wrote:

Matthew,

I don't know if this is simple (though it was easy enough for me in
Java), or even if it's exactly what you had in mind. But it sounds as if
you are asking for a hierarchical combination to include the top URLs by
uid. Is that correct?

If so, perhaps thishttps://groups.google.com/d/msg/elasticsearch/_oMbAnpjSGg/II4Tzf6RoSwJwill give you some ideas.

Hope this helps! Good luck!

Brian

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3e7ef115-1b27-4e67-b6dd-185b84f9c76d%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAvgm0%2BrXV1kHrKn0_p5iueNWirWo2siOa8t3c8L7D9Hg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Matthew Boynes) #5

Hi Ivan, thanks for taking the time to respond. After reading up on this, I
believe you're correct: field collapsing would give me exactly what I want.
I also started reading about aggregations, and perhaps that will work as
well -- it seems like I could create a bucket for each uid and then count
the number of buckets. Since the docs are pretty scarce on aggregations
thus far, it's hard to say. When I have some free time, I'll check out the
1.0 beta and see if I can come up with something. Thanks again!

Matt

On Saturday, January 11, 2014 3:28:53 PM UTC-5, Ivan Brusic wrote:

Sounds like what you are looking for is field collapsing which is not yet
supported in elasticsearch. ETA is post 1.0 release. Perhaps there is a way
with the new aggregations framework, but I have yet to try it out.

Cheers,

Ivan

On Fri, Jan 10, 2014 at 3:13 PM, Matthew Boynes <
mbo...@alleyinteractive.com <javascript:>> wrote:

Hey Brian,
Thanks for taking the time to respond. I looked at the post you suggested
and I don't think that would give me what I need. I believe that what
you're suggesting would provide me with the most views of a single url by a
unique visitor. In other words, if a url has 1004 entries, where one person
viewed it a thousand times and 4 people each viewed it once, the crazy
person who kept refreshing would be at the top of the facet response. I
need to know that the url had 5 unique visitors. If there was only one url
in the index this would work in a roundabout way, because I could look at
the total number of terms returned in the facet. Unfortunately, that's just
not the case here.

Of course, if I'm misunderstanding what your post suggests, or if I've
missed something, please let me know!

Thanks,
Matt

On Friday, January 10, 2014 5:54:46 PM UTC-5, InquiringMind wrote:

Matthew,

I don't know if this is simple (though it was easy enough for me in
Java), or even if it's exactly what you had in mind. But it sounds as if
you are asking for a hierarchical combination to include the top URLs by
uid. Is that correct?

If so, perhaps thishttps://groups.google.com/d/msg/elasticsearch/_oMbAnpjSGg/II4Tzf6RoSwJwill give you some ideas.

Hope this helps! Good luck!

Brian

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3e7ef115-1b27-4e67-b6dd-185b84f9c76d%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f6ab78c6-3146-4106-b273-3b92b3ab5293%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6