Determining unique number of entries in a termsFacet, passing that to date-histogram-facet


(Peter Dietz) #1

I'm trying to build a dashboard that has the number of unique visitors
(unique ipAddress) that we've gotten per day.

I'm using a date-histogram-facet to handle the dates/intervals. The
problem, is that I can't just interval over the range, and select the total
number of IP's, because every hit has an IP address. I want the
value_script (more-or-less) to be able to determine the number of unique IP
addresses within that interval, then all that data is in the
date-histogram-facet, and I'm happy.

If it were possible to put a facet inside of a facet, that would get me
there, but somewhat messy. That would solve this guys
need: https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/tSgZNU6b87o
Because even with the termsFacet of all ipAddresses inside of my date
intervals, then I have all the IP address, and all of their counts for the
interval, but I just want the termsFacetEntry.size(), which equates to the
number of unique IP addresses.

{ 
    "time": 201200101, 
    [   { "term": "192.168.1.1", "count": 10 }, 
        { "term": "8.8.4.4", "count": 8 },
        { "term": "128.146.173.9", "count": 5 },
    ] 
} 

Which is why I'm thinking if value_script could be more clever, and I could
tell that to use something like doc['IP'].uniques, then that would solve my
problem. Otherwise invent a new field that can take a
uniqueTerms().field("ip"). I would delegate the actual working of this to
people more versed on conventions.
{
"time": 201200101, "count" : 23, "uniqueTerms['IP']" : 3
}

I'm pretty happy with Elastic Search, but there's a few of these edge
cases, where we're having trouble swapping out one implementation for
another.


(Shay Banon) #2

Hi,

Yea, you can't do it with elasticsearch. I would certainly love to have
it there (at one point), but its very very tricky to implement in a
distributed environment (unique counts)...

On Fri, Mar 16, 2012 at 5:43 PM, Peter Dietz pdietz84@gmail.com wrote:

I'm trying to build a dashboard that has the number of unique visitors
(unique ipAddress) that we've gotten per day.

I'm using a date-histogram-facet to handle the dates/intervals. The
problem, is that I can't just interval over the range, and select the total
number of IP's, because every hit has an IP address. I want the
value_script (more-or-less) to be able to determine the number of unique IP
addresses within that interval, then all that data is in the
date-histogram-facet, and I'm happy.

If it were possible to put a facet inside of a facet, that would get me
there, but somewhat messy. That would solve this guys need:
https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/tSgZNU6b87o
Because even with the termsFacet of all ipAddresses inside of my date
intervals, then I have all the IP address, and all of their counts for the
interval, but I just want the termsFacetEntry.size(), which equates to the
number of unique IP addresses.

{
    "time": 201200101,
    [   { "term": "192.168.1.1", "count": 10 },
        { "term": "8.8.4.4", "count": 8 },
        { "term": "128.146.173.9", "count": 5 },
    ]
}

Which is why I'm thinking if value_script could be more clever, and I
could tell that to use something like doc['IP'].uniques, then that would
solve my problem. Otherwise invent a new field that can take a
uniqueTerms().field("ip"). I would delegate the actual working of this to
people more versed on conventions.
{
"time": 201200101, "count" : 23, "uniqueTerms['IP']" : 3
}

I'm pretty happy with Elastic Search, but there's a few of these edge
cases, where we're having trouble swapping out one implementation for
another.


(system) #3