Let's say I am trying to get the top 10 items with the top sales across a time period. I want to just show it on a UI with the item id, item name, item store id, item section id, item brand and item sales amount in period. All items for a given item id have the same name, item store id, item section id, and item brand. What is the right way to do this? How do I get performance while doing this?
At first, I thought I could just use the hits array to create a lookup map, but the ES window limit is 10,000. That worked, but a problem occurred when a store had more than 10,000 hits with a given period (see query below).
{
"size" : 10000,
"timeout" : 1500,
"query" : {
"bool" : {
"must" : [ {
"term" : {
"itemStoreId_string" : "12342323232"
}
}, {
"range" : {
"orderDate_timestamp" : {
"from" : "2019-01-01T00:00:00.000Z",
"to" : "2019-06-01T00:00:00.000Z",
"include_lower" : true,
"include_upper" : true
}
}
} ]
}
},
"aggregations" : {
"byItemTopSales" : {
"terms" : {
"field" : "itemId_string",
"size" : 10,
"order" : {
"aggTotalSales" : "desc"
}
},
"aggregations" : {
"aggTotalSales" : {
"sum" : {
"field" : "totalSales_double_not_indexed"
}
}
}
}
}
}
}
Then, I tried aggregating more fields so I wouldn't have to rely on a lookup map and wouldn't run into the 10k limit on window size. I noticed that this query (below) is significantly slower. Is this expected? How can I make this faster?
{
"size" : 0,
"timeout" : 1500,
"query" : {
"bool" : {
"must" : [ {
"term" : {
"itemStoreId_string" : "12342323232"
}
}, {
"range" : {
"orderDate_timestamp" : {
"from" : "2019-01-01T00:00:00.000Z",
"to" : "2019-06-01T00:00:00.000Z",
"include_lower" : true,
"include_upper" : true
}
}
} ]
}
},
"aggregations" : {
"byItemTopSales" : {
"terms" : {
"field" : "itemId_string",
"size" : 10,
"order" : {
"aggTotalSales" : "desc"
}
},
"aggregations" : {
"aggTotalSales" : {
"sum" : {
"field" : "totalSales_double_not_indexed"
}
},
"itemName" : {
"terms" : {
"field" : "itemName_string_not_indexed"
}
},
"itemId" : {
"terms" : {
"field" : "menuItemId_string_not_indexed"
}
},
"itemBrandName" : {
"terms" : {
"field" : "itemBrandName_string_not_indexed"
}
}
}
}
}
}
Again, all items for a given item id have the same name, item store id, item section, and item brand.
How do I change my query to have it just take the first matching name, item store id, .etc.? Any tips would be much appreciated. Thanks!