Pipeline aggregation with Date histogram doesn't return expected result

trickygiang · March 10, 2019, 11:46am

I'm facing an issue regarding to use Pipeline aggregation with Date histogram.
I need to filter data from: "2019-03-08T06:00:00Z" to "2019-03-09T10:00:00Z" and do histogram aggregation on that. Then calculate avg value after aggregating by cardinality agg.

{
  "size": 0,
  "query": {
        "bool" : {
            "filter": {
                "range" : {
                    "recordTime" : {
                        "gte" : "2019-03-08T06:00:00Z",
                        "lte" : "2019-03-09T10:00:00Z"
                    }
                }
            }
        }
    }, 
    "aggs" : {
        "events_per_bucket" : {
            "date_histogram" : {
                "field" : "eventTime",
                "interval" : "1h"
            },
            "aggs": {
                "cards_per_bucket": {
                    "cardinality": {
                        "field": "KANBAN_PKKEY.keyword"
                    }
                }
            }
        },
        "avg_cards_per_bucket": {
            "avg_bucket": {
                "buckets_path": "events_per_bucket>cards_per_bucket.value"
            }
        }
    }
}

Result:

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "events_per_bucket": {
            "buckets": [
                {
                    "key_as_string": "2019-03-08T06:00:00.000Z",
                    "key": 1552024800000,
                    "doc_count": 1,
                    "cards_per_bucket": {
                        **"value": 1**
                    }
                },
                {
                    "key_as_string": "2019-03-08T07:00:00.000Z",
                    "key": 1552028400000,
                    "doc_count": 0,
                    "cards_per_bucket": {
                        **"value": 0**
                    }
                },
                {
                    "key_as_string": "2019-03-08T08:00:00.000Z",
                    "key": 1552032000000,
                    "doc_count": 1,
                    "cards_per_bucket": {
                        **"value": 1**
                    }
                }
            ]
        },
        "avg_cards_per_bucket": {
            **"value": 1**
        }
    }
}

The problem is why avg value is "1"? It should be: 2/3 = 0.6666
Why 0 value cardinality bucket is ignored?
If i remove cardinality agg and do avg on doc_count (events_per_bucket>_count) then it works fine.
The same thing happens for MAX, MIN, SUM as well.
Any help would be appreciated!
Thank you.

trickygiang · March 11, 2019, 1:12pm

The issue can be fixed by :
"gap_policy": "insert_zeros"

system · April 8, 2019, 1:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Date histogram aggregation issue for arrays fields Elasticsearch	2	275	February 26, 2022
Pipeline aggregations: apply Histogram over Terms results Elasticsearch	1	402	July 5, 2017
Use Date histogram interval in calculation? Elasticsearch	2	963	July 6, 2017
Bucket Selector Aggregation on Date Histogram _key Elasticsearch	5	2222	May 1, 2017
How to do date histogram again after date histogram aggregation? Elasticsearch	1	363	April 28, 2020

Pipeline aggregation with Date histogram doesn't return expected result

Related topics