Strange behaviour when counting events in bar chart

kruelah · June 1, 2016, 9:44am

Hi
I want to create a vertical bar chart visualisation where I count the number of events depending to event timestamp, and I notice a behavior that does not understand.

My event time field is called ACT_HEURE_ACTIVITE.
My time dimension spreads from 2th of may 00:00:00 to 3th of may 00:00:00.

When I set my Y-axis agregation to Count Kibana returns 102 ivents. Same if I change to Unique count using INC_ID field (which is a unique ID).

When I let my Y-axis agregation to Count mode and set my X-axis to Date Histogram using ACT_HEURE_ACTIVITE and daily interval, Kibana still returns 102 events.

But when I change my Y-axis agregation to Unique Count using INC_ID field the system returns 100 events...

And I change the Y-axis field to ACT_HEURE_ACTIVITE Kibana returns 103 events !

Could anyone explain this behaviour?

LeeDr · June 2, 2016, 10:56pm

What count does the Discover tab show for that time range? I think that would be the most accurate count of docs since it isn't doing any aggregations.

kruelah · June 3, 2016, 7:37am

102 events

LeeDr · June 3, 2016, 2:09pm

So it seems you must have 102 documents in that time range.

But it seems that either 3 documents have the same INC_ID, or 2 pairs of documents have the same INC_ID such that there's only 100 unique INC_IDs. On the Discover tab you could add the INC_ID field to the table and sort by that to try to see the duplicates.

And I change the Y-axis field to ACT_HEURE_ACTIVITE Kibana returns 103 events !

You don't show a screen shot of this last combination. I don't know why you would see a count of 103. But I can't think of any reason to use the timestamp on both the X and Y Axis.

Another way to find the duplicates might be to follow the example in this screen shot and put JSON in the advanced JSON Input field (except you would use { "min_doc_count":2} and that should show you only the duplicate INC_IDs.

kruelah · June 3, 2016, 3:22pm

I added the INC_ID field in Discover tab and exported data to Excel (copy-paste values). My sheet containts 102 lines. When I remove duplicates on INC_ID column I also obtain 102 distinct values.

This is the result:

Indeed there is no reason to use timestamp on Y-axis as well. My goal was to use another unique field and check the agregation function.

I also tried to follow your example but I think I do something wrong because I don't obtain expected result.

Any idea?

LeeDr · June 3, 2016, 4:19pm

I think on the min_doc_count part you should trying changing "Date Histogram" to "Terms" and change "Field" to INC_ID

kruelah · June 6, 2016, 7:22am

I set my visualization as said and obtain a "No results found" message.

kruelah · June 6, 2016, 1:10pm

I set up a new basic testing environement using following scripts to create index and mapping, and import datas.
https://www.wetransfer.com/downloads/0e2cb9930e37a56a7504ba70160e295d20160606123941/9e3c84
Data file contains 324 events which concerns only the 2th of may 2016.

As you can see my new index contains 324 documents.

The mapping has following structure:

Events are displayed in Discover tab:

I obtain 324 results when I agregate my data by date histogram on ACT_ACTIVITY_TIME field.

But if I choose to set my metric agregation type to unique count I obtains 335 results...

Note that I obtain differents results depending to the field I choose.

LeeDr · June 6, 2016, 2:17pm

Your last screenshot has a time range of 'Last 60 days'. I think you meant for that to be only the 2nd of May 2016. Do you get 324 if you set it to only that day?

kruelah · June 13, 2016, 1:21pm

I still obtain 335 results if I set the time range from 2nd to 3rd of may and choose a daliy custom interval.

In my previous post I didn't set time range because elasticsearch index only contains events data that occured the 2nd of may.

kruelah · July 5, 2016, 7:21am

Any idea?

LeeDr · July 21, 2016, 5:41pm

I started looking into this again. One thing to look at is this;

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#_counts_are_approximate

I don't see the precision_thresholdspecified in the request for a visualization with unique count used so I guess it uses the default. The default value is described here, but don't make complete sense to me;

Default value depends on the number of parent aggregations that multiple create buckets (such as terms or histograms).

I also tried a test with test data I had, and when I charted unique count of a time field I also got many more results than I expected. But then I found that the time field represented an array and some docs had multiple time values in them.

Topic		Replies	Views
Bar Chart with two aggregations & filter Kibana	2	383	October 28, 2019
How to find events with duplicate field value? Kibana	2	4018	March 12, 2020
Irregular behaviour between "Count", "Sum" and "Unique Count" aggregations Kibana	4	729	July 6, 2017
[Problem] Unique items per day graph Kibana	2	1978	July 18, 2017
Visualize count of unique identities over time Kibana	4	8875	April 23, 2018

Strange behaviour when counting events in bar chart

Related topics