Range on Sum Aggregation


(Vijay Arora) #1

I am trying to write a query which would give me the count of Ids where sum total of transactions fall into a certain range.

In terms of SQL query something like:

Select  ProductId   from SaleTransactions where SalesQuarter='Q1' group by ProductId having Sum(SalesAmount) > 100 and Sum(SalesAmount) <1000 having SalesQuarter =Q1

The index mapping is like:

"mappings" : {
	"product" : {
	"properties": {
		"ProductId: : { "type" : "keyword"},
		"SaleTransactions" : {
			"type": "nested",
			"include_in_parent" : true,
			"id": {"type": "keyword"},
			"Product_Id": {"type": "keyword"},
			"sale_amount": {"type": "double"},
			"SalesQuarter": {"type": "keyword" } 
		}
	}
	}
}

I tried applying the Sum Aggregation and then using a bucket_selector to get the range, and that gives me multiple buckets, one for each id. I am not sure if this is the best way, especially when we have a large transnational volume, is there a better way to do it by either applying a value_count on it or mixing the range aggregation on sum aggregation.
Here is my sample aggs query:

"aggs":{
  "aggs1":{
    "terms":{
      "field":"SaleTransactions.Product_Id"
    },
    "aggs":{
      "filtered-entities":{
        "filter":{
          "terms":{
            "SaleTransactions.SalesQuarter":[
              "Q1"
            ]
          }
        },
        "aggs":{
          "sum_Total":{
            "sum":{
              "field":"SaleTransactions.sale_amount"
            }
          }
        }
      },
      "sumTotal-filter":{
        "bucket_selector":{
          "buckets_path":{
            "sumTotal":"filtered-entities>sum_Total"
          },
          "script":"params.sumTotal > 500 && params.sumTotal <= 5000"
        }
      }
    }
  }
}

Thanks


(Zachary Tong) #2

Your agg is good... this is the way to do it. You have to first aggregate together all the docs and sum up individual sale_amount's for each ID. Only after all the values have been summed can you filter the range, via the bucket_selector. Which leaves you the IDs that have sale amounts inside the range, which you can count up (either client-side or by another pipeline).

The concern over large number of transactions is valid though. The terms aggregation is optimized for "top-n" scenarios. If you need to return a very large number of terms, or all of them, you should instead checkout using the newish composite aggregation: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html

The composite agg is designed to paginate/scroll over the buckets of an aggregation, allowing you to get all the results in a manner that won't destroy the cluster. :slight_smile:


(Vijay Arora) #3

Thanks for pointing out to composite aggregation.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.