Stats_bucket the correct query here?

Question:

  • How do I get the average sales the clerk's are doing in each region and store?
  • How do I not include clerkSums in the responses? I only care about salesStats and costStats?

Imagine I have a single index with 2 number columns, sales and cost. And 3 string columns, region, store, clerk.

How would I get the average and max for both number values, sales and cost, for each clerk? How do I get those values by region? Or region and store?

For clerk stats:

It seems like I need to create a terms bucket for the clerks with their sales and cost sums. Then do a stats_bucket query over that bucket for each of the sales and cost numbers.

  "aggregations": {
    "clerkStats": {
      "terms": { "field": "clerk" },
      "aggs": {
        "sales": { "sum": { "field": "sales" } },
        "cost": { "sum": { "field": "cost" } }
      }
    },
    "salesStats": {
      "stats_bucket": { "buckets_path": "clerkStats>sales" }
    },
    "costStats": {
      "stats_bucket": { "buckets_path": "clerkStats>cost" }
    }
  }

This gives the following useful properties

  • salesStats.avg would have the average sales for all clerks
  • salesStats.max would have the max sales for all clerks
  • costStats.avg would have the average costs for all clerks
  • costStats.max would have the max costs for all clerks

To wrap that by region it would be

  "aggregations": {
    "region": {
      "terms": { "field": "region" },
      "aggs": {
        "aggs": {
          "clerkSums": {
            "terms": { "field": "clerk" },
            "aggs": {
              "sales": { "sum": { "field": "sales" } },
              "cost": { "sum": { "field": "cost" } }
            }
          },
          "salesStats": {
            "stats_bucket": { "buckets_path": "clerkSums>sales" }
          },
          "costStats": {
            "stats_bucket": { "buckets_path": "clerkSums>cost" }
          }
        }
      }
    }
  }

This gives the following useful properties in region buckets

  • salesStats.avg being the average sales for all clerks in that region
  • salesStats.max being the max sales for all clerks in that region
  • costStats.avg being the average cost for all clerks in that region
  • costStats.max being the max cost for all clerks in that region

For region and store:

Same thing as the region alone query, but in a composite query with both region and store?

  "aggregations": {
    "combo": {
      "composite": {
        "sources": [
          { "region": { "terms": { "field": "region" } } },
          { "store": { "terms": { "field": "store" } } }
        ]
      },
      "aggs": {
        "aggs": {
          "clerkSums": {
            "terms": { "field": "clerk" },
            "aggs": {
              "sales": { "sum": { "field": "sales" } },
              "cost": { "sum": { "field": "cost" } }
            }
          },
          "salesStats": {
            "stats_bucket": { "buckets_path": "clerkSums>sales" }
          },
          "costStats": {
            "stats_bucket": { "buckets_path": "clerkSums>cost" }
          }
        }
      }
    }
  }

This does not work. You cannot include the clerkSums inside the composite aggregation.

The only workaround I know of is to do 2 terms aggregations to nest region and store.

Question:

  • How do I get the average sales the clerk's are doing in each region and store?
  • How do I not include clerkSums in the responses? I only care about salesStats and costStats?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.