Counting the duplicates and non-duplicates of a count aggregation

Hello all,

I'm hoping someone can help me with my issue, as I feel it's a simple problem to solve and it's just me missing something! I'm fairly new to ELK (3 months in), so hopefully this isn't too much of a newbie question :slight_smile:

I've attached a screenshot of a simple verticle bar chart to visualise what I'm trying to acheive. In the 'red' group, you have IDs that only have a single count, 75 in total. In the 'blue' group, you have ID's that have more than 1 count, 16 in total.

What I would like to do is create a simple metric to show the 75 in the red group as 'Single entry IDs', and another simple metric to show the 16 in the blue group as 'multiple entry IDs'.

I've tried reading up on pipeline aggregations, but if I'm honest it's tricky to follow as I'm still early on in my learning curve.

I've also tried the CURL request from this post - https://stackoverflow.com/questions/53359102/elasticsearch-count-duplicated-and-unique-values

However, that doesn't seem to work as it doesn't count the non-duplicate ID's properly (I get 0). It's also all in the console and I want this to be shown in Kibana!

Really hope that all makes sense, I've tried to explain it as best I can. Happy to assist with further info if needed.

Many thanks!

Hello,

So I ingested some sample data to simulate your scenario - They looked like this:


PUT /test_count/_doc/1
{
  "id": 1
}

PUT /test_count/_doc/3
{
  "id": 2
}

Please note total number of documents I ingested is 6.

Then I created a pie chart using terms aggregation on id:

Does that help? id here is mapped as a number field

Thanks,
Bhavya

Hi Bhavya,

Thanks for coming back to me! Appreciate your example, however it's not quite what I'm after.

Building on your example, let's say you ingested the following documents -

3 x documents with id 26
6 x documents with id 12
1 x document with id 4
1 x document with id 21
1 x document with id 7

Two of those documents (26 and 12) are duplicates (i.e. 2 or more).

Three of those documents (4, 21, 7) are non-duplicates.

The pie chart I need is made up of the count of the duplicates, and the count of the non-duplicates.

Count of duplicates = 2 (40%)
Count of non-duplciates = 3 (60%)

I really hope there's a way to do this without having to perform post-processing of my data and re-ingesting it!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.