Transforms - using painless for terms aggs on nested objects

I understand that Transforms does not support nested aggregations, but what is the alternative ?

here is the mappings:

"properties" : {
  "clientId" : {
"type" : "integer"
  },
  "clientIdAsString" : {
"type" : "keyword"
  },
  "client_ip" : {
"type" : "ip"
  },
  "country" : {
"type" : "keyword"
  },
  "currency" : {
"type" : "keyword"
  },
  "dataLayer" : {
"type" : "nested",
"dynamic" : "true",
"properties" : {
  "name" : {
    "type" : "keyword"
  },
  "products" : {
    "type" : "nested",
    "dynamic" : "true",
    "properties" : {
      "id" : {
        "type" : "keyword"
      },
      "name" : {
        "type" : "keyword"
      },
      "price" : {
        "type" : "float"
      },
      "quantity" : {
        "type" : "long"
      },
      "total_price" : {
        "type" : "float"
      }
    }
  },
  "quantity" : {
    "type" : "long"
  },
  "total_price" : {
    "type" : "float"
  }
}
  },
  "date_time" : {
"type" : "date"
  },
  "wz_session" : {
"type" : "keyword"
  }
}

and here is the query:

POST _transform/_preview
{
  "source": {
    "index": [
      "user_events*"
    ],
    "query": {
      "bool": {
        "filter": [
          {
            "term": {
              "clientIdAsString": "645"
            }
          },
          {
            "term": {
              "domainsGroup": "google.com"
            }
          },
          {
            "exists": {
              "field": "dataLayer.products"
            }
          }
        ]
      }
    }
  },
  "pivot": {
    "group_by": {
      "wz_session": {
        "terms": {
          "field": "wz_session"
        }
      },
      "clientIdAsString": {
        "terms": {
          "field": "clientIdAsString"
        }
      },
      "domainsGroup": {
        "terms": {
          "field": "domainsGroup"
        }
      },
      "productId": {
        "terms": {
          "field": "dataLayer.products.id"
        }
      },
      "event_key": {
        "terms": {
          "field": "event_key"
        }
      }
    },
    "aggregations": {
      "minDate": {
        "min": {
          "field": "date_time"
        }
      },
      "maxDate": {
        "max": {
          "field": "date_time"
        }
      },
      "pageviews": {
        "sum": {
          "field": "is_checkout_view"
        }
      },
      "totalPrice": {
        "sum": {
          "field": "dataLayer.products.total_price"
        }
      },
      "quantity": {
        "sum": {
          "field": "dataLayer.products.quantity"
        }
      }
    }
  }
}

what I'm trying to do is basically terms aggs on dataLayer.products.id and for each product id, I need to calculate the sum of the quantity, total price.

an example of a source data:

...
"dataLayer" : {
            "total_price" : 34.95,
            "currency" : "EUR",
            "id" : "6055",
            "products" : [
              {
                "quantity" : 1,
                "total_price" : 34.95,
                "price" : 6.24,
                "sku" : "10101117",
                "name" : "Product Name 1"
              },
              {
                "quantity" : 1,
                "total_price" : 34.95,
                "price" : 6.24,
                "sku" : "10101261",
                "name" : "Product Name 2"
              },
              {
                "quantity" : 2,
                "total_price" : 34.95,
                "price" : 6.24,
                "sku" : "10101155",
                "name" : "Product Name 3"
              },
              {
                "quantity" : 1,
                "total_price" : 34.95,
                "price" : 4.16,
                "sku" : "10115924",
                "name" : "Product Name 4 "
              }
            ]
          }
...

an example of a result:

{
      "clientIdAsString" : "123",
      "minDate" : "2021-04-30T18:00:25.000Z",
      "pageviews" : 1.0,
      "quantity" : 16.0,
      "productId" : "1031",
      "domainsGroup" : "google.com",
      "totalPrice" : 440.70000171661377,
      "wz_session" : "f8561ry8092Z1n67h74r1tCN0",
      "maxDate" : "2021-04-30T18:00:25.000Z",
      "event_key" : "checkout_view_1"
    },

this is the a part of of the above transform, the problem is, because there is no use of nested aggs, the 'quantity', and 'totalPrice' are actually the sum of all the products in the dataLayer.products array (which is of type nested).

how could I achieve that in transforms ?

This is a limitation of the underlying composite aggregation. It's possible to access the nested buckets in the doc, however without knowing which group by bucket you are in, I don't see a way to workaround this with painless, I am sorry.

The only other way I see is not using the nested data type in your data source.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.