Intersection aggregation

Hi,

I've been struggling to write a query to do the following, any help would be greatly appreciated.

My index contains one document per transaction. I'm looking to find:

  • Number of customers that have purchased product A AND product B
  • Number of customers that have purchased A OR purchased product B

The latter is easy, using a terms agg (below), but I cant figure out how to correctly identify an intersection. Is this possible?

GET trans/_search
{
  "size": 0,
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "product.ds_product_key": "88300"
          }
        },
        {
          "match": {
            "product.ds_product_key": "421345"
          }
        }
      ]
    }
  },
  "aggs": {
    "by_product": {
      "terms": {
        "field": "product.ds_product_key"
      },
      "aggs": {
        "total_customers": {
          "cardinality": {
            "field": "customer.identifiers.customer_key"
          }
        }
      }
    },
    "total_customers": {
          "cardinality": {
            "field": "customer.identifiers.customer_key"
          }
        }
  }
}

Thanks,
Martin

There is an agg specifically designed for examining intersections - the adjacency_matrix.

However, the usual rules apply about attempting user-behaviour analytics on an index storing only event logs. See the guidance on creating an entity-centric-index with each doc representing a customer. Then use the adjacency_matrix on a multi-valued field that lists each customer's purchases.

Thanks Mark, I had been looking at this type of aggregation. I'll dig a little deeper into this after I've created my new index.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.