Let's say that I'm using Elasticsearch to host and query log data, and I
want to do some basic analysis of it like I would in Google Analytics. One
of the most basic differentiations I'm having trouble with is pageviews vs.
unique pageviews. I'm sure there's a very simple solution, but for some
reason it's escaping me.
I don't know if this is simple (though it was easy enough for me in Java),
or even if it's exactly what you had in mind. But it sounds as if you are
asking for a hierarchical combination to include the top URLs by uid. Is
that correct?
Hey Brian,
Thanks for taking the time to respond. I looked at the post you suggested
and I don't think that would give me what I need. I believe that what
you're suggesting would provide me with the most views of a single url by a
unique visitor. In other words, if a url has 1004 entries, where one person
viewed it a thousand times and 4 people each viewed it once, the crazy
person who kept refreshing would be at the top of the facet response. I
need to know that the url had 5 unique visitors. If there was only one url
in the index this would work in a roundabout way, because I could look at
the total number of terms returned in the facet. Unfortunately, that's just
not the case here.
Of course, if I'm misunderstanding what your post suggests, or if I've
missed something, please let me know!
Thanks,
Matt
On Friday, January 10, 2014 5:54:46 PM UTC-5, InquiringMind wrote:
Matthew,
I don't know if this is simple (though it was easy enough for me in Java),
or even if it's exactly what you had in mind. But it sounds as if you are
asking for a hierarchical combination to include the top URLs by uid. Is
that correct?
Sounds like what you are looking for is field collapsing which is not yet
supported in elasticsearch. ETA is post 1.0 release. Perhaps there is a way
with the new aggregations framework, but I have yet to try it out.
Hey Brian,
Thanks for taking the time to respond. I looked at the post you suggested
and I don't think that would give me what I need. I believe that what
you're suggesting would provide me with the most views of a single url by a
unique visitor. In other words, if a url has 1004 entries, where one person
viewed it a thousand times and 4 people each viewed it once, the crazy
person who kept refreshing would be at the top of the facet response. I
need to know that the url had 5 unique visitors. If there was only one url
in the index this would work in a roundabout way, because I could look at
the total number of terms returned in the facet. Unfortunately, that's just
not the case here.
Of course, if I'm misunderstanding what your post suggests, or if I've
missed something, please let me know!
Thanks,
Matt
On Friday, January 10, 2014 5:54:46 PM UTC-5, InquiringMind wrote:
Matthew,
I don't know if this is simple (though it was easy enough for me in
Java), or even if it's exactly what you had in mind. But it sounds as if
you are asking for a hierarchical combination to include the top URLs by
uid. Is that correct?
Hi Ivan, thanks for taking the time to respond. After reading up on this, I
believe you're correct: field collapsing would give me exactly what I want.
I also started reading about aggregations, and perhaps that will work as
well -- it seems like I could create a bucket for each uid and then count
the number of buckets. Since the docs are pretty scarce on aggregations
thus far, it's hard to say. When I have some free time, I'll check out the
1.0 beta and see if I can come up with something. Thanks again!
Matt
On Saturday, January 11, 2014 3:28:53 PM UTC-5, Ivan Brusic wrote:
Sounds like what you are looking for is field collapsing which is not yet
supported in elasticsearch. ETA is post 1.0 release. Perhaps there is a way
with the new aggregations framework, but I have yet to try it out.
Hey Brian,
Thanks for taking the time to respond. I looked at the post you suggested
and I don't think that would give me what I need. I believe that what
you're suggesting would provide me with the most views of a single url by a
unique visitor. In other words, if a url has 1004 entries, where one person
viewed it a thousand times and 4 people each viewed it once, the crazy
person who kept refreshing would be at the top of the facet response. I
need to know that the url had 5 unique visitors. If there was only one url
in the index this would work in a roundabout way, because I could look at
the total number of terms returned in the facet. Unfortunately, that's just
not the case here.
Of course, if I'm misunderstanding what your post suggests, or if I've
missed something, please let me know!
Thanks,
Matt
On Friday, January 10, 2014 5:54:46 PM UTC-5, InquiringMind wrote:
Matthew,
I don't know if this is simple (though it was easy enough for me in
Java), or even if it's exactly what you had in mind. But it sounds as if
you are asking for a hierarchical combination to include the top URLs by
uid. Is that correct?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.