Counting the duplicates and non-duplicates of a count aggregation

benji87 · August 5, 2020, 6:34pm

Hello all,

I'm hoping someone can help me with my issue, as I feel it's a simple problem to solve and it's just me missing something! I'm fairly new to ELK (3 months in), so hopefully this isn't too much of a newbie question

I've attached a screenshot of a simple verticle bar chart to visualise what I'm trying to acheive. In the 'red' group, you have IDs that only have a single count, 75 in total. In the 'blue' group, you have ID's that have more than 1 count, 16 in total.

What I would like to do is create a simple metric to show the 75 in the red group as 'Single entry IDs', and another simple metric to show the 16 in the blue group as 'multiple entry IDs'.

I've tried reading up on pipeline aggregations, but if I'm honest it's tricky to follow as I'm still early on in my learning curve.

I've also tried the CURL request from this post - https://stackoverflow.com/questions/53359102/elasticsearch-count-duplicated-and-unique-values

However, that doesn't seem to work as it doesn't count the non-duplicate ID's properly (I get 0). It's also all in the console and I want this to be shown in Kibana!

Really hope that all makes sense, I've tried to explain it as best I can. Happy to assist with further info if needed.

Many thanks!

bhavyarm · August 5, 2020, 8:12pm

Hello,

So I ingested some sample data to simulate your scenario - They looked like this:


PUT /test_count/_doc/1
{
  "id": 1
}

PUT /test_count/_doc/3
{
  "id": 2
}

Please note total number of documents I ingested is 6.

Then I created a pie chart using terms aggregation on id:

Does that help? id here is mapped as a number field

Thanks,
Bhavya

benji87 · August 6, 2020, 8:08am

Hi Bhavya,

Thanks for coming back to me! Appreciate your example, however it's not quite what I'm after.

Building on your example, let's say you ingested the following documents -

3 x documents with id 26
6 x documents with id 12
1 x document with id 4
1 x document with id 21
1 x document with id 7

Two of those documents (26 and 12) are duplicates (i.e. 2 or more).

Three of those documents (4, 21, 7) are non-duplicates.

The pie chart I need is made up of the count of the duplicates, and the count of the non-duplicates.

Count of duplicates = 2 (40%)
Count of non-duplciates = 3 (60%)

I really hope there's a way to do this without having to perform post-processing of my data and re-ingesting it!

system · September 3, 2020, 8:09am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Counting the duplicates and non-duplicates of a count aggregation (revisited) Kibana	3	611	November 4, 2020
Treating duplicate key values as one Kibana	2	545	November 11, 2019
Difference between using (in the Field Metric) Count and Unique Count Kibana	4	1324	August 17, 2021
Metric count on multiple indices Kibana	3	1397	July 6, 2017
Create a Table that shows only docs where the unique count of a specific field is difference of 2 Kibana	7	1873	May 8, 2017

Counting the duplicates and non-duplicates of a count aggregation

Related topics