Maybe I need more coffee, but I can't work this out...
I have say 10000 documents in a time period, one of the fields in a document is 'type' and there are, say 50 different values for 'type' in this document set.
I want to display a single metric visualization that is the average count across all the types, i.e. 200 in this example.
I can't see how to do this without a data table with rows split by 'type', displaying the average at the bottom, but this only works for the first N unique values of 'type', and I don't want to see all the unique values of 'type'
Bonus points for an example that uses Visual Builder.
The Cardinality aggregation gets me the 50 number in my original example - i.e the number of different types - it does not get me the average per type, i.e. the 200.
However, I did spot the Math aggregation and was able to set two variables, one for the overall document cont and one for the number of types (using the Cardinality aggregation), and was then able to divide the first by the second.
As I also wanted an hourly average, I also scaled to the time interval so that the calculation was still valid when changing the range, as that causes the interval to change.
So my Math expression was:
(params.numDocs/params.numTypes) * 60 * 60 * 1000 / params._Interval
Does this all look reasonable or is there a simpler approach?
While it was working yesterday, today the math expression generates a "Cannot divide by 0" error. I haven't changed the visualisation, so the only difference is that it is referencing a different set of data (my time filter is 'Last 24 hours). In fact, if I change the filter to be between 2 days ago and 1 day ago, the expression works fine.
I simplified the expression to just params.numDocs/params.numDevices and still get the error.
If I change the expression to contain either of those terms on their own, I get correct, non-zero values, e.g.:
params.numDoc=8740
params.numDevices=1471
So it looks like the divide by zero error is a bug in the Math aggregation pipeline.
I am using the Amazon Elasticsearch V7.1 plus R20190808 (Service Software).
Any thoughts on how to diagnose this further or workaround it?
Ah I think I know what is happening - the 'Last 24 hours' filter includes a period of about 20 minutes (in the last hour actually) where numDoc is zero, and as numDevices is determined by the cardinality of one term in the documents, it must also be zero for those 20 minutes. See the image I pasted below, which is the document count over time).
But that implies the Math expression is calculated for every individual time bucket and then aggregated over the whole timescale, rather than calculating the two terms over the whole timescale and then applying the Math expression.
I can't avoid buckets with zero documents, so how can I write this panel and Math expression better to avoid the divide by zero error?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.