Average calculations not consistent

Wpq · December 21, 2022, 9:01am

I have data of the type

{
  "user": "john",
  "country": "japan",
  "applied": 1
},
{
  "user": "mary",
  "country": "japan",
  "applied": 1
},
{
  "user": "paul",
  "country": "japan",
  "applied": 0
},
{
  "user": "franck",
  "country": "france",
  "applied": 1
}

I would like to show a graph of countries with the percentage of applications.

To do so, I created a Lens of type Pie that I Sliced by the Top 50 values of country, and Sized by the average of applied.
I used the top 50 despite having only about 10 countries in case there are new countries. When the number is above 10, adding more to the top does not change the result (which is logical).

So I have a pie that I thought was fine, except when I started to have a closer look at the numbers. Specifically, when clicking on the pie section for japan, I saw that it had a 67% of applications. This is the correct result ((1 + 1 + 0)/3).

The full pie, however, does not show 67% for japan, but a smaller value. With the data I have this is 25.08% (there are indeed 3 entries for japan in my data, but about 20k for other countries).

What is the exact calculation used in Lens of type Pie, Sliced by the Top N values of X, and Sized by the average of Y?

Marco_Liberati · December 21, 2022, 9:42am

Hi @Wpq

welcome to the Kibana community.

Do you have any advanced configuration enabled in the Top values?
Screenshot 2022-12-21 at 10.40.36

If the Japan percentage value is lower, probably the Other bucket (if enabled) is taking up most of the space in the pie?

Wpq · December 21, 2022, 9:46am

Thank you for your answer.

I did have "Group ... as Other" set up but it was redundant (I do no have "Other" as my Top 50 values cover all possible (10) values I have now). I tried to switch it on and off, as well as the "Accuracy" one but it did not change the problem.

Wpq · December 21, 2022, 9:55am

Hmmm, going forward with your comment about Other, I picked a field where I have a large diversity (say, age which is not present in the example data above, but where I would have various values).

I chose Top 5 age and again sized by applied (which means "what is the % of people of age X that applied"). It gave a number.

I then tried to take more Top values, and the relative % for the existing ones changed. I do not understand this at all - how can changing the number of visible values change the ratio of applied (= the percentage, effectively) for a given value? It should always be the average of applied for that value, this is independent of anything else.

Marco_Liberati · December 21, 2022, 10:25am

You mean having the Other option and increasing the Top value "value" makes an existing "slice" reduce its size?
Or without the Other option enabled?

Wpq · December 21, 2022, 11:50am

Both.

If I do not have Other and start at, say, Top 5 I have 5 slices in the pie. As I increase the number of slices, the percentage decreases (again just for clarity: this is the average of a 0 or 1 value, so it should be independent of anything else, including the number of displayed elements).

Same when I have Other, I just start with 5+1 slices (5 for the top ones, and Other) and changing the number of Top values changes the percentage.

Note: when you say

reduce its size

I assume you mean what I mean above - that is that the displayed relative percentage is changed. I am not talking about the "visual size" of a slice (its angle if you wish). Sorry if this is obvious, I just wanted to make sure we are talking about the same thing.

Marco_Liberati · December 21, 2022, 1:45pm

Can you post some pictures with the problem?
I think I'm missing something obvious here, but maybe looking at it will make it clearer.

Wpq · December 21, 2022, 2:11pm

Sure. Below is the general Pie with the top countries, sized by the average of a value clicked which is 0 or 1

When looking at the details of "Kazakhstan", I see three entries:

As you can see, there are 2 x 1 and 1 x 0 so the average is 0.67 (or 67%).

Marco_Liberati · December 21, 2022, 2:51pm

I think there's a misunderstanding of the pie visualization here: a pie visualization is used to represent portions (slices) to a whole. The total of a pie chart will be always 100%.
Even if 67% of the time documents with Kazakhstan value have an applied: 1 value, in the pie chart that will be normalized to the whole sum of all the slices represented.

It sounds like you are looking for something like a bar chart here to see how often applied: 1 is set per country.
I've reproduced something similar here with the Flight sample dataset where for each carrier I show the % of cancelled flights:

As metric in this case I've used a formula to compute the %: count(kql="applied: 1") / count() would be in your case. As formatter I've set the percentage one.
Would this work in your case?

Wpq · December 21, 2022, 2:58pm

Thank you!

I actually did not want to use the pie visualization which is usually the worst one to use. I tried with "Bar vertical percentages" and I was always getting 100% and gave up.

Looking at your example, I used the plain "Vertical bar" and changed the vertical display to % and it worked. Thanks again.

In retrospect, using the pie visualization made zero sense in my case as the percentages do not add up to 100% - I was kinda hoping that the angle would be a function of the value of the average.

system · January 18, 2023, 2:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.