Kibana alternate way to remove duplicates or get precise Unique Count

groverjatin17 · March 28, 2019, 9:30am

Hi All,

i have 60000 documents in an index. Many of these documents have same value for field "BookId".

NOTE:-There are 14000 unique BookId's just many duplicates because they have different values in other fields of other documents and creating 60000 total hits in an index.

I am creating a Bar Chart visualization with "Category" in X-axis and Unique count of "BookId" in Y-axis metric. But it produces wrong Unique count, upon some searching it says that it is for approximation and setting JSON to {"precision_threshold" : 40000} would solve it. But it is still missing thousands of value.

If it is an approx value. how can I get the unique count/remove duplicates in my Bar Graph ?

Also, Can i filter out the unique in DISCOVER tab so it shows right count in hits?

christophilus · March 28, 2019, 2:48pm

Above 40K, the results are still fuzzy, unfortunately, as documented here. Since you have 60K records, I think you're still going to see fuzzy cardinality results.

We have a client-side scripting language (Kibana expressions), which you could probably put to use here, although it's not quite ready to go for the bar chart.

If you really want precision, you may need to write a plugin that tallies things. Here's an example of getting a distinct count of "category" per "city" from a "pets" index, using JavaScript. You can test this locally by running Kibana like this: yarn start --repl. After Kibana boots, you can paste this code into the REPL (in your terminal), and then enter clientDistinct(), and you should see an accurate distinct count. You'll want to modify the query to actually select the fields / index you want.

async function clientDistinct(kbnServer) {
  const callCluster = kbnServer.server.plugins.elasticsearch.getCluster('admin').callWithInternalUser;
  const result = {};
  let from = 0;

  while (true) {
    // callCluster is a function which calls Elasticsearch, and may
    // not be exactly what you'd use...
    const { hits } = await callCluster('search', { 
      index: 'pets',
      body: {
        "from" : from,
        "_source" : {
          "includes" : [
            "category",
            "city"
          ],
          "excludes" : [ ]
        },
        "sort" : [
          {
            "_doc" : {
              "order" : "asc"
            }
          }
        ]
      }
    });

    if (!hits || !hits.hits.length) {
      break;
    }

    from += hits.hits.length;

    // This does a distinct count of categories grouped by city
    hits.hits.forEach(({ _source }) => {
      const set = result[_source.city] || new Set();
      result[_source.city] = set;
      set.add(_source.category);
    });
  }

  // Returns something like: { newyork: 3, seattle: 55 }
  return Object.keys(result).reduce((acc, k) => {
    acc[k] = result[k].size;
    return acc;
  }, {});
}

christophilus · March 28, 2019, 2:57pm

I should note that this is fairly trivial to do in Canvas:

essql query="SELECT city, category FROM pets"
| ply by=city fn={math "unique(category)"}

You'll need to modify that to query the index, and changeby=city and unique(category) to be whatever columns you're working on.

system · April 25, 2019, 2:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to improve the accuracy of Unique Count metric in Kibana Kibana	4	1573	July 27, 2020
Kibana Calculations Give Wrong Results Kibana lens	11	659	August 30, 2023
Unique value counts in kibana Kibana	5	84055	December 18, 2017
Explicit Unique Count for Business Intelligence tasks: no chance? Kibana	4	621	March 16, 2018
Kibana unique count more that number of records Kibana	5	183	May 8, 2024

Kibana alternate way to remove duplicates or get precise Unique Count

Related topics