Check out the significant terms aggregation. This video describes using it for a similar scenario. Substitute band names for your products and youâre there.
The result doesn't look like what I'd expect - where's the score value?
Significant_terms not the significant_text agg should be used on keyword fields.
I previously assumed the question was "what goes with coke?" but it looks like the question you're trying to ask is potentially something to do with "what's special about pairings on given days?"
Can you say more about the business problem you're trying to solve?
Is this about ranking straight popularity or particular significance on each day?
The top answer to the former is likely to be "bread and milk" for every single day.
The top answer to the latter could be unusual changes e.g pancake ingredients last Tuesday.
One way of optimising the index for this analysis is to also index SKU pairs. So given ["milk", "bread"] you'd also index them as an alphabetically sorted pair i.e. the pair token "bread_milk". Problem is that is an n-squared indexing strategy so doesn't work well when there are many unique products and many items purchased per basket.
Ok - thatâs just a âtermsâ aggregation on a SKU-pair field like I described in my last post. The problem is when you have baskets with more than bread and milk in them. Your application code would have to generate all the pairs for a basket with bread, cheese, milk, eggs, coke etc. Thatâs a lot of SKU pairs and probably too expensive to generate if baskets are big.
Another approach is to have a two terms aggs on the âSKUâ field - one nested under the other. This will give you the top 10 products on a day, and for each of those, the top 10 companion products also bought with those. Itâs a slightly different analysis though eg in a worst case scenario a number one product might be selected that is only ever bought on its own (a ticket to park the car on arrival?) and has no pairings. So itâs really the most popular products and what theyâre paired with rather than the most popular product pairs. Probably good enough though.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.