Hi,
Is there a way to set the precision threshold when performing a unique/distinct count using Canvas?
I need the count to be as accurate as possible. All hints and tips welcome.
Hi,
Is there a way to set the precision threshold when performing a unique/distinct count using Canvas?
I need the count to be as accurate as possible. All hints and tips welcome.
Hey there Amy,
This all depends on how you're doing your counting in the query or expression. Do you have an example query?
Hi poff,
Sure, I have been using the below:
However, it doesn't give me the expected value. I have also tried something like:
I do get the correct value in a kibana visualization by using 'Unique Count' with the json input of {"precision_threshold" : 40000}
I have been reading this may have something to do with cardinality aggregation... do you think this is the reason here? if so, how to do I make the output more accurate?
I just confirmed with the ES SQL team that aggregation like this is imprecise and unfortunately there isn't a way to tell the SQL endpoint to change the precision threshold. You might consider opening a ticket in the elasticsearch github repo with this feature request for the SQL team.
As for the second example that uses the math function to count unique values, I'm a little surprised this isn't working. Are there any filters being passed in that would hide some expected data?
If all else fails and you were able to create a visualization outside of Canvas that pulls your data that you want, you can add that visualization directly to a workpad
Hi poff,
Thanks, I think I will have to open the ticket as this is very important to us.
As you can see below, the correct value is in the Kibana Visualization on the left. The top right is the output of the count(distinct uniqueIdentifier) and is over estimating by 61. The bottom right is the unique(uniqueIdentifier) and is vastly under estimating by 3991. No additional filters have been passed
I don't really want to add the visualization to the workpad unless there is a way to sync the time range of the Visualizations with the Canvas Time Filter. Is this possible?
Thanks,
Amy
I don't really want to add the visualization to the workpad unless there is a way to sync the time range of the Visualizations with the Canvas Time Filter. Is this possible?
Yes! You can do this two days:
filters
function in the expression for the element. One note: you'll notice a default time range is specified for the visualization. If you want to control the data you're looking at via time filter on workpad, change the timerange on the visualization element to be a very large time range so then the range passed in by the time filter element is usedAlso! I realized that the reason the number is so low (990) in that one element is because by default the essql function has a limit to the number of rows it will retrieve by default. You can change this by adding a count
argument to the essql expression function call. Note: This might be not super performant since all the counting will happen in memory. Additionally, essql is capped at 10k records it can return at one time so if you have more than that, you might not get all your data
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.