I have an ES index of documents culled from products with unique serial numbers reporting phoning home, along with the timestamp of when they reported and a bunch of other data that I'd like to report on:
A problem is that sometimes the product will phone home more than once a week, if for example it was rebooted.
My aim is to construct an ES query, returning a DateHistogramFacet of matches per month or per quarter, where I would like at most one unique serial number to be counted per week, thus if a particular product happens to report multiple times a week, it contributes at most once to the total count. Essentially, removing duplicate entries where duplicate is defined as the same serial number having two json documents within the same 7 day period.
Is there an easy way to accomplish this within ES?
Did you try the distinct_date_histogram of elasticsearch-timefacets-plugin?
On 17/07/2013 15:05, ajses wrote:
Hi!
I have an ES index of documents culled from products with unique serial
numbers reporting phoning home, along with the timestamp of when they
reported and a bunch of other data that I'd like to report on:
A problem is that sometimes the product will phone home more than once a
week, if for example it was rebooted.
My aim is to construct an ES query, returning a DateHistogramFacet of
matches per month or per quarter, where I would like at most one unique
serial number to be counted per week, thus if a particular product happens
to report multiple times a week, it contributes at most once to the total
count. Essentially, removing duplicate entries where duplicate is defined as
the same serial number having two json documents within the same 7 day
period.
Is there an easy way to accomplish this within ES?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.