Precision Threshold of Count in Canvas

Hi,

Is there a way to set the precision threshold when performing a unique/distinct count using Canvas?

I need the count to be as accurate as possible. All hints and tips welcome.

Hey there Amy,

This all depends on how you're doing your counting in the query or expression. Do you have an example query?

Hi poff,

Sure, I have been using the below:

However, it doesn't give me the expected value. I have also tried something like:


but this is even worse with a value much much less than expected.

I do get the correct value in a kibana visualization by using 'Unique Count' with the json input of {"precision_threshold" : 40000}

I have been reading this may have something to do with cardinality aggregation... do you think this is the reason here? if so, how to do I make the output more accurate?

I just confirmed with the ES SQL team that aggregation like this is imprecise and unfortunately there isn't a way to tell the SQL endpoint to change the precision threshold. You might consider opening a ticket in the elasticsearch github repo with this feature request for the SQL team.

As for the second example that uses the math function to count unique values, I'm a little surprised this isn't working. Are there any filters being passed in that would hide some expected data?

If all else fails and you were able to create a visualization outside of Canvas that pulls your data that you want, you can add that visualization directly to a workpad

Hi poff,

Thanks, I think I will have to open the ticket as this is very important to us.

As you can see below, the correct value is in the Kibana Visualization on the left. The top right is the output of the count(distinct uniqueIdentifier) and is over estimating by 61. The bottom right is the unique(uniqueIdentifier) and is vastly under estimating by 3991. No additional filters have been passed :frowning:
image

I don't really want to add the visualization to the workpad unless there is a way to sync the time range of the Visualizations with the Canvas Time Filter. Is this possible?

Thanks,
Amy

I don't really want to add the visualization to the workpad unless there is a way to sync the time range of the Visualizations with the Canvas Time Filter. Is this possible?

Yes! You can do this two days:

  1. Use a time filter on the workpad and pass in the filters function in the expression for the element. One note: you'll notice a default time range is specified for the visualization. If you want to control the data you're looking at via time filter on workpad, change the timerange on the visualization element to be a very large time range so then the range passed in by the time filter element is used
  2. Use Canvas workpad variables to define your start and end time for the range and use the variables in each expression, passing the variable in to the timerange argument

Also! I realized that the reason the number is so low (990) in that one element is because by default the essql function has a limit to the number of rows it will retrieve by default. You can change this by adding a count argument to the essql expression function call. Note: This might be not super performant since all the counting will happen in memory. Additionally, essql is capped at 10k records it can return at one time so if you have more than that, you might not get all your data

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.