Mean aggregation over cadinality aggregation, over previous composite aggregation

Silver137 · January 14, 2022, 7:31am

My use case require to obtain an average over previous cardinality aggregation, this aggregations works over the buckets generated by a composite aggregation "calendar histogram" that breaks the timeline in calendar days, due to I'm working over a long time span.

I don't know whats fits best to solve this case using Elasticsearch capabilities, sub-aggregation, pipeline-aggregation ... , the cardinality column is generated over the calendar buckets using sub-aggregation, I would like work over this output to obtanin an average.

aggregations: {
    "daily_product_names_for_specific_shop": {
        "composite": {
            "size": 10000,
            "sources": [
              {
                "date": {
                  "date_histogram": {
                    "field": "@timestamp",
                    "calendar_interval": "1d",
                    "format": "iso8601"
                  }
                }
              }
            ]
        },
        # Sub-aggregation over composite buckets
        "aggregations": {
           "different_products_amount": {
               "cardinality": {
                    "field": "ticket.p_name"
                }
           }
        # Maybe some stuff
        }
    }
}

Tomo_M · January 14, 2022, 9:49am

In my opinion, average bucket aggregation of pipeline aggregation fits your purpose. If it doesn't work, please ask again.

Silver137 · January 17, 2022, 7:49am

I have tried:

aggregations: {
    "daily_product_names_for_specific_shop": {
        "composite": {
            "size": 10000,
            "sources": [
              {
                "date": {
                  "date_histogram": {
                    "field": "@timestamp",
                    "calendar_interval": "1d",
                    "format": "iso8601"
                  }
                }
              }
            ]
        },
        # Sub-aggregation over composite buckets
        "aggregations": {
           "different_products_amount": {
               "cardinality": {
                    "field": "ticket.p_name"
                }
           },
           # The avg
           "mean": {
                "avg_bucket": {
                    "buckets_path": "different_products_amount",
                    "gap_policy": "skip",
                    "format": "#,##0.00;(#,##0.00)"
              }
        }
    }
}

But gets:

Validation Failed: 1: The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [mean] found :org.elasticsearch.search.aggregations.metrics.CardinalityAggregationBuilder for buckets path: different_products_amount;

Also I have already tried:

aggregations: {
    "daily_product_names_for_specific_shop": {
        "composite": {
            "size": 10000,
            "sources": [
              {
                "date": {
                  "date_histogram": {
                    "field": "@timestamp",
                    "calendar_interval": "1d",
                    "format": "iso8601"
                  }
                }
              }
            ]
        },
        # Sub-aggregation over composite buckets
        "aggregations": {
           "different_products_amount": {
               "cardinality": {
                    "field": "ticket.p_name"
                }
           }
        }
    },
   "mean": {
       "avg_bucket": {
                    "buckets_path": "daily_product_names_for_specific_shop>different_products_amount",
                    "gap_policy": "skip",
                    "format": "#,##0.00;(#,##0.00)"
       }
}

But the composite acts:

Validation Failed: 1: The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [mean] found :org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationBuilder for buckets path: daily_product_names_for_specific_shop>different_products_amount;

Tomo_M · January 17, 2022, 10:52am

The latter query looks correct grammarticaly. Sorry, I didn't know pipeline aggregation can not be used with composit aggregation.

I found an issue explaining the reason.

github.com/elastic/elasticsearch

Pipeline metrics aggregations do not recognize composite aggregations as multi-bucket

opened 10:55PM - 07 Aug 18 UTC

ghost

>enhancement :Analytics/Aggregations Team:Analytics

**Elasticsearch version** (`bin/elasticsearch --version`): 6.3.0 official docker image **Plugins installed**: [] **JVM version** (`java -version`): 10.0.1 **OS version** (`uname -a` if on a Unix-like system): Linux 389f11186e5b 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux **Description of the problem including expected versus actual behavior**: Haven't seen the Git issue posted for this [thread](https://discuss.elastic.co/t/elasticsearch-6-2-composite-bucket-aggregation-with-sum-bucket-aggregation-and-pipeline-aggregations/142658), so I'm posting it to get the ball rolling since my team has encountered it as well. Pipeline metrics aggregations do not recognize composite aggregations as multi-bucket. However, composite aggregations are multi-bucket, so this should work. **Steps to reproduce**: 1. Template mapping ``` PUT _template/template_default { "mappings": { "_doc": { "_all": { "enabled": false }, "dynamic": "strict", "properties": { "itemId": { "type": "keyword", "norms": false }, "inputQty": { "type": "integer", "index": false }, "orderQty": { "type": "integer", "index": false }, "centerId": { "type": "keyword", "eager_global_ordinals": true, "norms": false }, "submittedQty": { "type": "integer", "index": false }, "confirmedQty": { "type": "integer", "index": false } } } } } ``` 2. REST call ``` POST items-0*/_search?ignore_unavailable=true { "size": 0, "track_total_hits": false, "aggs" : { "myBuckets" : { "composite" : { "size" : 100000, "sources" : [ { "center_name" : { "terms" : { "field" : "centerId"} } } ] }, "aggs" : { "requested_units" : { "sum": { "field" : "inputQty" } }, "approved_units" : { "sum": { "field" : "orderQty" } }, "submitted_quantity" : { "sum" : { "field" : "submittedQty"} }, "confirmed_quantity" : { "sum" : { "field" : "confirmedQty"} } } }, "check_pipeline_agg": { "sum_bucket": { "buckets_path": "fc_buckets>requested_units" } } } } ``` **Provide logs (if relevant)**: The error that comes back will be similar to: ```json { "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": "The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [avg_cardinality] found :org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationBuilder for buckets path: composite_buckets>cardinality_some_field" } ], "type": "search_phase_execution_exception", "reason": "all shards failed", "phase": "query", "grouped": true, "failed_shards": [ { "shard": 0, "index": "my_index", "node": "1_8dwXRuT565uQg11iZ_SA", "reason": { "type": "illegal_argument_exception", "reason": "The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [avg_cardinality] found :org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationBuilder for buckets path: composite_buckets>cardinality_some_field" } } ], "caused_by": { "type": "illegal_argument_exception", "reason": "The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [avg_cardinality] found :org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationBuilder for buckets path: composite_buckets>cardinality_some_field", "caused_by": { "type": "illegal_argument_exception", "reason": "The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [avg_cardinality] found :org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationBuilder for buckets path: composite_buckets>cardinality_some_field" } } }, "status": 400 } ```

Are there any destructive problems by using only datetime histogram aggregation without composite aggregation and discard unnecessary outputs?

system · February 14, 2022, 10:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pipeline aggregation with Date histogram doesn't return expected result Elasticsearch	2	365	April 8, 2019
Pipeline aggregations: apply Histogram over Terms results Elasticsearch	1	402	July 5, 2017
Avg bucket aggregation over date histogram returns incorrect value when having empty bucket Elasticsearch	1	651	August 1, 2019
Elasticsearch - bucket aggregation selection Elasticsearch	1	666	July 5, 2017
Aggregating Metrics based on custom (bucket) intervals Elasticsearch	1	389	October 8, 2018

Mean aggregation over cadinality aggregation, over previous composite aggregation

Related topics