Elasticsearch similar elements count

James_wood · February 17, 2021, 8:46pm

I have the following product order list and need to find out what products are ordered together:

  [
      {
        "order_id": 1,
        "products": ["Coke", "Fanta"],
        "created_on": "2016-12-26T03:41:46+00:00"
    },
      {
        "order_id": 2,
        "products": ["Coke", "Pepsi", "Sprite"],
        "created_on": "2016-12-26T03:42:46+00:00"
      },
      {
        "order_id": 3,
        "products": ["Coke", "Sprite"],
        "created_on": "2016-12-26T03:45:46+00:00"
      }
    ]

As per the above example, the correct aggregation output should be "Coke" and "Sprite" since they were ordered 3 times.

Expected output:

{
  "products_duo_of_the_day": 1,
  "buckets": [
    {
      "products": [
        "Coke",
        "Sprite"
      ],
      "doc_count": 2
    }
  ]
}

Can someone please help me with building the Elastisearch query for getting this example output?

Mark_Harwood · February 17, 2021, 11:07pm

Check out the significant terms aggregation. This video describes using it for a similar scenario. Substitute band names for your products and you’re there.

Mark_Harwood · February 18, 2021, 4:41pm

A few things jump out

The result doesn't look like what I'd expect - where's the score value?
Significant_terms not the significant_text agg should be used on keyword fields.
I previously assumed the question was "what goes with coke?" but it looks like the question you're trying to ask is potentially something to do with "what's special about pairings on given days?"

Can you say more about the business problem you're trying to solve?

Mark_Harwood · February 18, 2021, 5:31pm

Is this about ranking straight popularity or particular significance on each day?
The top answer to the former is likely to be "bread and milk" for every single day.
The top answer to the latter could be unusual changes e.g pancake ingredients last Tuesday.

One way of optimising the index for this analysis is to also index SKU pairs. So given ["milk", "bread"] you'd also index them as an alphabetically sorted pair i.e. the pair token "bread_milk". Problem is that is an n-squared indexing strategy so doesn't work well when there are many unique products and many items purchased per basket.

Mark_Harwood · February 18, 2021, 6:26pm

Ok - that’s just a ‘terms’ aggregation on a SKU-pair field like I described in my last post. The problem is when you have baskets with more than bread and milk in them. Your application code would have to generate all the pairs for a basket with bread, cheese, milk, eggs, coke etc. That’s a lot of SKU pairs and probably too expensive to generate if baskets are big.
Another approach is to have a two terms aggs on the “SKU” field - one nested under the other. This will give you the top 10 products on a day, and for each of those, the top 10 companion products also bought with those. It’s a slightly different analysis though eg in a worst case scenario a number one product might be selected that is only ever bought on its own (a ticket to park the car on arrival?) and has no pairings. So it’s really the most popular products and what they’re paired with rather than the most popular product pairs. Probably good enough though.

Mark_Harwood · February 18, 2021, 7:12pm

Won’t the 2 terms aggs on the SKU field work for you?

So you’d have:

Date agg
    Terms agg on SKU 
         Terms agg on SKU

Mark_Harwood · February 18, 2021, 7:29pm

With complex script that is very likely to blow up memory. What is about the 2 terms aggs approach that doesn’t work for you?

system · March 18, 2021, 7:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Making order count Elasticsearch	6	305	July 6, 2017
Working with arrays Elasticsearch	2	442	August 20, 2017
Recommendations returning the matched item Elasticsearch	5	404	May 22, 2018
"Often buyed together" using aggregations? Elasticsearch	5	2029	December 30, 2017
How execute AND query in elasticsearch and aggregate Elasticsearch	2	469	August 30, 2019

Elasticsearch similar elements count

Related topics