Accessing the value of a Cardinality Subaggregation in a Painless script to filter by the value

AlexDiru · January 25, 2019, 1:05pm

Problem: I want to use a Script Query in a Filter Aggregation to create a condition based on the values of subaggregations. I have no idea how to access the values of the subaggregations in a Painless script.

A few curl requests to create the data. The data represents items sold in a shop with the name of the shop and the name of the item stored, as well as a boolean flag as to whether the item is original to that shop.

PUT shop-item
PUT shop-item/_mapping/_doc
{
  "properties": {
    "isOriginalItem": {
      "type": "boolean"
    },
    "shopName": {
      "type": "keyword"
    },
    "itemName": {
      "type": "keyword"
    }
  }
}
POST _bulk
{"index":{"_index":"shop-item","_type":"_doc"}}
{"isOriginalItem":true,"shopName":"Sainsburys","itemName":"Meat"}
{"index":{"_index":"shop-item","_type":"_doc"}}
{"isOriginalItem":true,"shopName":"Lidl","itemName":"Meat"}
{"index":{"_index":"shop-item","_type":"_doc"}}
{"isOriginalItem" : true,"shopName" : "Aldi","itemName" : "Meat"}
{"index":{"_index":"shop-item","_type":"_doc"}}
{"isOriginalItem" : false,"shopName" : "Tesco","itemName" : "Meat"}
{"index":{"_index":"shop-item","_type":"_doc"}}
{"isOriginalItem" : false,"shopName" : "Sainsburys","itemName" : "Cabbage"}
{"index":{"_index":"shop-item","_type":"_doc"}}
{"isOriginalItem" : false,"shopName" : "Lidl","itemName" : "Cabbage"}
{"index":{"_index":"shop-item","_type":"_doc"}}
{"isOriginalItem" : true,"shopName" : "Aldi","itemName" : "Cabbage"}
{"index":{"_index":"shop-item","_type":"_doc"}}
{"isOriginalItem" : false,"shopName" : "Tesco","itemName" : "Cabbage"}
{"index":{"_index":"shop-item","_type":"_doc"}}
{"isOriginalItem" : true,"shopName" : "Morrisons","itemName" : "Cat Food" }
{"index":{"_index":"shop-item","_type":"_doc"}}
{"isOriginalItem" : false,"shopName" : "Lidl","itemName" : "Cat Food"}
{"index":{"_index":"shop-item","_type":"_doc"}}
{"isOriginalItem" : true,"shopName" : "Co-op","itemName" : "Cat Food"}
{"index":{"_index":"shop-item","_type":"_doc"}}
{"isOriginalItem" : true,"shopName" : "Tesco","itemName" : "Cat Food"}

The following aggregation shows the number of unique shops and the total number of shops that an original item is sold in:

GET shop-item/_search
{
  "size": 0,
  "aggregations": {
    "original_items": {
      "filter": {
        "term": {
          "isOriginalItem": {
            "value": true
          }
        }
      },
      "aggs": {
        "group_by_item_name": {
          "terms": {
            "field": "itemName"
          },
          "aggs": {
            "number_of_unique_shops_sold_in": {
              "cardinality": {
                "field": "shopName"
              }
            },
            "total_number_of_shops_sold_in": {
              "value_count" : {
                "field": "shopName"
              }
            }
          }
        }
      }
    }
  }
}

Where I'm stuck is I want to only create buckets in group_by_item_name if the following condition is met:

number_of_unique_shops_sold_in.value > (0.95 * total_number_of_shops_sold_in)

In which case it will show the bucket aggregations for Cat Food and Cabbage

I've tried using Painless for this adding a filter with a Script Query but I have no idea how to access the values of subaggregations, Google has not yielded any results nor has the various combinations of things I've tried

If people are interested I can post some of the attempts I've made

I want to add a filter which only creates buckets in group_by_item_name aggregation if the value of the number_of_unique_shops_sold_in is met:

number_of_unique_shops_sold_in.value > (0.95 * total_number_of_shops_sold_in)

In which case it will show the bucket aggregations for Cat Food and Cabbage

I've tried using Painless for this adding a filter with a Script Query but I have no idea how to access the values of subaggregations, Google has not yielded any results nor has the various combinations of things I've tried

If people are interested I can post some of the (syntax error) attempts I've made

Mark_Harwood · January 25, 2019, 1:14pm

Maybe the example is over-simplified but you only look to have one record per unique item+shop name. In which case the doc count of an item == cardinality of shop names? In which case you can more simply trim buckets by min_doc_count?

AlexDiru · January 25, 2019, 2:05pm

Yep you are correct, the example is oversimplified sorry about that, even so I would still need to access the value of the number_of_unique_shops_sold_in an aggregation in the Painless script to compare it against the condition

Here is my attempt at the filtering, it won't compile because of the script but hopefully it's more clear what I'm trying to do

GET shop-item/_search
{
  "size": 0,
  "aggregations": {
    "original_items": {
      "filter": {
        "term": {
          "isOriginalItem": {
            "value": true
          }
        }
      },
      "aggs": {
        "group_by_item_name": {
          "terms": {
            "field": "itemName"
          },
          "aggs": {
            "filter_using_shop_threshold": {
              "filter": {
                "script": {
                  "script": {
                    "source": "return number_of_unique_shops_sold_in._value > 0.95 * total_number_of_shops_sold_in._value;"
                  }
                }
              },
              "aggs": {
                "number_of_unique_shops_sold_in": {
                  "cardinality": {
                    "field": "shopName"
                  }
                },
                "total_number_of_shops_sold_in": {
                  "value_count" : {
                    "field": "shopName"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Mark_Harwood · January 25, 2019, 3:01pm

This example script has a misconception about the time and the place in which the script runs.

Scripts inside the aggregation tree are given a single document to evaluate on a single shard. The scripts can inspect the fields of that doc but that's the scope of the context it works in. It works in the data-gathering phase on data nodes.

In contrast, the "cardinality" value is a final result value only available after all matching docs have been examined on all shards and returned to a central coordinating node.

To perform any reasoning about final aggregation results on the coordinating node you need to use a pipeline aggregation such as thebucket selector

AlexDiru · January 25, 2019, 3:09pm

Thank you so much

Here's the solution to my question:

GET shop-item/_search
{
  "size": 0,
  "aggregations": {
	"original_items": {
	  "filter": {
		"term": {
		  "isOriginalItem": {
			"value": true
		  }
		}
	  },
	  "aggs": {
		"group_by_item_name": {
		  "terms": {
			"field": "itemName"
		  },
		  "aggs": {
			"number_of_unique_shops_sold_in": {
			  "cardinality": {
				"field": "shopName"
			  }
			},
			"total_number_of_shops_sold_in": {
			  "value_count" : {
				"field": "shopName"
			  }
			},
			"bucket_filter": {
			  "bucket_selector": {
				"buckets_path": {
				  "uniqueShops": "number_of_unique_shops_sold_in",
				  "totalShops": "total_number_of_shops_sold_in"
				},
				"script": "params.uniqueShops > 0.95 * params.totalShops"
			  }
			}
		  }
		}
	  }
	}
  }
}

system · February 22, 2019, 3:09pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filter array values using painless script Elasticsearch painless	1	759	October 20, 2020
Regex and Aggregations using Painless Elasticsearch	6	2695	January 9, 2018
Aggs with painless script on an object containing an array Elasticsearch	4	1841	December 26, 2016
Painless scripted value versus field cardinality discrepancy [ES 7.0] Elasticsearch	1	547	June 10, 2019
Accessing query from within painless script? Elasticsearch	3	574	May 11, 2017

Accessing the value of a Cardinality Subaggregation in a Painless script to filter by the value

Related topics