How to calculate the average count per unique term

Maybe I need more coffee, but I can't work this out...

I have say 10000 documents in a time period, one of the fields in a document is 'type' and there are, say 50 different values for 'type' in this document set.

I want to display a single metric visualization that is the average count across all the types, i.e. 200 in this example.

I can't see how to do this without a data table with rows split by 'type', displaying the average at the bottom, but this only works for the first N unique values of 'type', and I don't want to see all the unique values of 'type'

Bonus points for an example that uses Visual Builder.

Thanks,
Andrew

Hi Andrew,

Here's a :coffee: for you. :slightly_smiling_face:

For a Metric in Visual Builder, you can use the Cardinality aggregation and select the appropriate field.

Compare this to a standard Metric visualization where you use the Unique Count aggregation.

They both send the same query to Elasticsearch.

Hi Nick,

The Cardinality aggregation gets me the 50 number in my original example - i.e the number of different types - it does not get me the average per type, i.e. the 200.

However, I did spot the Math aggregation and was able to set two variables, one for the overall document cont and one for the number of types (using the Cardinality aggregation), and was then able to divide the first by the second.

As I also wanted an hourly average, I also scaled to the time interval so that the calculation was still valid when changing the range, as that causes the interval to change.

So my Math expression was:
(params.numDocs/params.numTypes) * 60 * 60 * 1000 / params._Interval

Does this all look reasonable or is there a simpler approach?

regards,
Andrew

Hi Andrew,

I'm glad you figured that out. And yes, your Math expression looks right to me since params._Interval is in milliseconds.

Hi Nick,

While it was working yesterday, today the math expression generates a "Cannot divide by 0" error. I haven't changed the visualisation, so the only difference is that it is referencing a different set of data (my time filter is 'Last 24 hours). In fact, if I change the filter to be between 2 days ago and 1 day ago, the expression works fine.

I simplified the expression to just params.numDocs/params.numDevices and still get the error.
If I change the expression to contain either of those terms on their own, I get correct, non-zero values, e.g.:
params.numDoc=8740
params.numDevices=1471

So it looks like the divide by zero error is a bug in the Math aggregation pipeline.

I am using the Amazon Elasticsearch V7.1 plus R20190808 (Service Software).

Any thoughts on how to diagnose this further or workaround it?

regards,
Andrew

Ah I think I know what is happening - the 'Last 24 hours' filter includes a period of about 20 minutes (in the last hour actually) where numDoc is zero, and as numDevices is determined by the cardinality of one term in the documents, it must also be zero for those 20 minutes. See the image I pasted below, which is the document count over time).

But that implies the Math expression is calculated for every individual time bucket and then aggregated over the whole timescale, rather than calculating the two terms over the whole timescale and then applying the Math expression.

I can't avoid buckets with zero documents, so how can I write this panel and Math expression better to avoid the divide by zero error?

Thanks,
Andrew

Hmm, I wonder if you could use the clamp function to ensure your denominator is never zero? Maybe clamp the minimum to 1?

params.numDoc/clamp(params.numDevices,1)

Clamp is no use as it requires a min and a max. The following worked, though:

params.numDoc/max(params.numDevices,1)

However this still feels like an Elasticsearch bug - I shouldn't have to explicitly handle the situation where a bucket has zero documents.

Andrew

Yes, I agree. It feels like a bug. Would you mind submitting a bug report on the Kibana repo? https://github.com/elastic/kibana/issues/new/choose

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.