How to display multi-select variables on bar charts as a percentage of records

I want to show responses to a multi-select question as a percentage of response IDs.

In a survey we used, we asked respondents to select all the options that apply to them. This means that we have more responses than response IDs, which is correct. When I show the data in a horizontal bar chart, it gives the correct count for each answer option (the sum of which is correctly greater than the sum of response IDs).
The formula I use to display the correct count of the multi-select:

count('Response ID')

However, when I use the formula to convert the data labels to percentage values, it incorrectly displays it as a percentage of the sum of responses (as opposed to response IDs).

The formula I use for converting counts to percentages is:

count() / overall_sum(count())

Or specifically:

count('Response ID') / overall_sum(count('Response ID'))

Hey Michal,

Depending on how your data is structured, you can try this formula for the y-axis, assuming the x-axis has the count for each answer option.

unique_count(ResponseID) / number of participants
1 Like

You can track this issue for the enhancement request of a total_count() feature in formula: [Lens] make total document count available in formula · Issue #160562 · elastic/kibana · GitHub

Thanks for your reply, JD_Armada.

Unfortunately, this does not solve my problem. I need the denominator to be dynamic. For example, if I filter by gender, I want my denominator to decrease so that it reflects the smaller number of filtered participants.

Thanks for your reply, Marco_Liberati.

I don't think that this exactly addresses my problem.

If I can explain by using an example:
Suppose I have 5,000 participants in a survey. One of the questions in the survey is a multi-select, to which I receive 11,000 responses, because participants can select more than one option in that question.

Normally I am able to display answers as percentages on bar charts by using:

count('Response ID') / overall_sum(count('Response ID'))

In the case of the multi-select question, however, the denominator incorrectly (or undesirably) sums to 11,000. I believe that this is because Kibana is executing the formula per bar and then forgets that it has already counted a participant. When I use "unique_count" (instead of "count") in the denominator, I get the same incorrect output.

I do not want to enter the total number of participant (5,000) in the denominator, because the denominator needs to be dynamic for when I use filters (which will decrease the size of the denominator).

Even if Kibana does not have a native function, is there perhaps a workaround to get my desired output? That is, a formula that counts responses of particular options in a multi-select question, and divides it by number of respondents (subject to filters selected).

Maybe there is, but kibana is querying/aggregating from elasticsearch, and elasticsearch is based around sets of documents, organized into indices, not "multi-select questions". I simply don't know how your data is being represented in documents. Maybe others have figured it out from what you've written, but you have restricted the pool of people who can offer help/suggestions.

If you share some sample documents (obfuscate what you need to) it might help widen the pool.

Sample 1: document from a respondent who selected three options:

"How was your accommodation funded in 2024? (Please select ALL that apply).keyword": [
"Student Loan",
"Parents",
"I personally pay (self-funded)"
],

Sample 2: document from a respondent who selected one option:

"How was your accommodation funded in 2024? (Please select ALL that apply).keyword": [
"Bursary"
],

OK, this helps. Thank you. But I'd think this is a pretty unusual way to represent this sort of data.

I'm just going to write what I've understood here ... might be wrong.

You have documents with a (mapped to keyword) field called

"How was your accommodation funded in 2024? (Please select ALL that apply)"

and its value is an array like

[ "Student Loan", "Parents", "I personally pay (self-funded)" ]
or
["Parents"]
or
["Bursary"]
or
["Bursary", "Student Loan"]

Obviously you have thousands of those documents, but I'll imagine we have just these 4.

Each of these documents is considered a single responseId (maybe thats another field?). And in the example above there are 3 responses for the first doc, and 1 response for the second doc, 1 for the third doc, and 2 for the fourth doc.

And you want (eg) to see what percentage chose "Bursary", for which answer is 50% (2/4). And what percentage chose "Student Loan", for which answer is also 50% (2/4). But also more complex Qs like, if I restrict to those with "Bursary", how many also had "Student Loans", then answer is 50% (1/2).

Am I on right track?

Yes, that's the mapping that I'm trying to display. Ultimately, I want to display these responses as a horizontal bar chart.

See Chart A below for an Excel version based on your example. You will notice that the sum of the bars is greater than 100% (which is correct and what I want to show in Kibana as well). However, in Kibana I can't seem to generate a bar chart where the sum of the bars is greater than 100%.

Chart B (below) is an Excel version of your example that is similar to the incorrect Kibana output that I'm getting at the moment.

The usual formula* for converting bar charts to percentage bars appears to calculate the denominator per bar, and "forgets" that it has already counted a responseID. As you've correctly assumed, responseID is a different field in the respective documents denoting a unique person's response to the survey.

*The formula I use for converting to percentages:
(count('responseID') / overall_sum(count('responseID'))

An incorrect way to achieve my desired output would be to use a constant denominator that equals the total number of responseIDs. This is incorrect because the denominator needs to change dynamically when I filter my data in my Kibana dashboard (say, by only selecting female respondents).

Is there perhaps a way to get my desired output (Chart A)? That is, a bar chart that shows percentages based on a formula that counts responses of particular options in a multi-select question (as you've correctly described in your comment) and divides it by number of responseIDs (subject to filters selected).

I'm not sure it helps you to characterizing things around your specific questions and multi-select answers. It might as well be recipes and arrays of ingredients - what percentage of recipes included celery? The important part is how the data is structured within Elasticsearch - which is at some level just a (searchable) general purpose document store. Whatever the source/story of your data its stored in elasticsearch as json documents, field names & field values.

If I have time I'll try knock up some sample documents. It would be way easier if you just shared some, i.e. those you see when doing a HTTP GET on /your_index_name/_search

My hunch is what you are trying to do is not hard at all but the confusing terminology makes it quite hard to follow.

In elasticsearch "mapping" has a specific meaning:

It would help to know the mapping, in the sense described in that documentation, of your index.