Strange (partly wrong?) numbers in results


(Chris Pete) #1

Hello there!

I'm using Kibana for some time now, testing a few a its analyzing features in our business environment which is based a lot on logistics and parcels. That being said, I tried to get an overview of shipped packages (packages:docs,1:1) based on a so called "shipment_guid". I used a whole month as the search period and got 58 docs as a result which is fine based on the city (where the shipment was sent), BUT if I add the next filter "shipment type" (UPS, TNT, etc.) it still has 58 docs as a result, but displays "54" out of nowhere.

I already compared all results (in Discover) using different approaches e.g. filtering only unique results based on either the timestamp, shipment_guid and other values, but the doc count has never been a "54".

Here's the request;

{ "query": { "filtered": { "query": { "query_string": { "analyze_wildcard": true, "query": "+receiver_street:*packstation* +contract:*ups*" } }, "filter": { "bool": { "must": [ { "$state": { "store": "globalState" }, "query": { "match": { "receiver_country_iso_alpha2": { "query": "DE", "type": "phrase" } } } }, { "query": { "match": { "location.raw": { "query": "Frankfurt", "type": "phrase" } } }, "$state": { "store": "globalState" } }, { "range": { "shipment_createdate_utc": { "gte": 1448928000000, "lte": 1451466944585, "format": "epoch_millis" } } } ], "must_not": [ { "$state": { "store": "globalState" }, "query": { "match": { "custship_method_id.raw": { "query": "96", "type": "phrase" } } } } ] } } } }, "size": 0, "aggs": { "4": { "terms": { "field": "location.raw", "size": 3, "order": { "1": "desc" } }, "aggs": { "1": { "cardinality": { "field": "shipment_guid" } }, "2": { "terms": { "field": "contract.raw", "size": 3, "order": { "1": "desc" } }, "aggs": { "1": { "cardinality": { "field": "shipment_guid" } }, "3": { "terms": { "field": "shipment_definition.raw", "size": 5, "order": { "1": "desc" } }, "aggs": { "1": { "cardinality": { "field": "shipment_guid" } } } } } } } } } }

and here's the response:

{ "took": 27, "timed_out": false, "_shards": { "total": 4, "successful": 4, "failed": 0 }, "hits": { "total": 58, "max_score": 0, "hits": [] }, "aggregations": { "4": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "1": { "value": 58 }, "2": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "1": { "value": 54 }, "3": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "1": { "value": 54 }, "key": "UPS Express Saver 12:00", "doc_count": 58 } ] }, "key": "UPS Express", "doc_count": 58 } ] }, "key": "Frankfurt", "doc_count": 58 } ] } } }

I'm a bit confused, since it still says "doc_count: 58", but displays "value: 54" - at least in the "sub results".


(Spencer Alger) #2

From what I can tell you are right and the correct cardinality is 58. I would bet this is caused by the implementation of cardinality which makes it efficient, but also makes it an aproximation (as stated in the docs).

A single-value metrics aggregation that calculates an approximate count of distinct values.


(system) #3