Sorting and filterting top hits docs

HI, is it possible to filter and sort the docs returned from top hits aggregation?

lets say I have a simple product index for docs like this:

{
   "product_name": "some_product",
   "category": "some_cotegory",
   "price": "200"
   "sold_times": "5",
   "store": "store1"
}

and I want to get the most expensive products in their category and per store that have been sold less than 3 times and I want them to be ordered by store, category and price.

I can use two terms aggregations and top hits aggregation (with size=1) to get the most expensive products in their category per store, but how can I sort and filter these top hits results? I really need to filter the results after the top hits agg is performed, so the filter query is not the solution. How can I do this? Thx

EDIT:
the query to get the most expensive products in their directory per store is as follows:

POST /products/_search?size=0
{
    "aggs": {
        "by_store": {
            "terms": {
                "field": "store"
            },
            "aggs": {
                "by_category": {
                    "terms": {
                        "field": "category"
                    },
                    "aggs": {
                        "most_expensive": {
                            "top_hits": {
                                "sort": [
                                    {
                                        "price": {
                                            "order": "desc"
                                        }
                                    }
                                ],
                                "size" : 1
                            }
                        }
                    }
                }
            }
        }
    }
}

This will return docs (as part of aggregations results) and I want to filter them by sold_times < 3 and sort them by store and category and price. So lets say that the most expensive product in category "accessories" is "charger" but this product was sold 5 times, so this most expensive product must be filtered out of the results.

Can you share your query so it is possible to understand better?

What about using using post_filter?

Hi. thx for your reply, I added the query to the question. If I understand the post_filter correctly, it is performed on the original data set (that was used for the aggregations computing) after the aggs were actually computed. But that is not what I need. I need to perform the aggs and then filter the docs returned by the top_hits agg (not the original data set)

I understand your issue after you have posted the query.
I think you can solve the issue by filtering out the range that you need. You can run range query to select only the products that have sold_times < 3 and then run the aggregation.

Something like following query.

{ "query": {
        "range" : {
            "sold_times" : {
                "gte" : 0,
                "lte" : 3
            }
        }
    },
    "aggs": {
        "by_store": {
            "terms": {
                "field": "store"
            },
            "aggs": {
                "by_category": {
                    "terms": {
                        "field": "category"
                    },
                    "aggs": {
                        "most_expensive": {
                            "top_hits": {
                                "sort": [
                                    {
                                        "price": {
                                            "order": "desc"
                                        }
                                    }
                                ],
                                "size" : 1
                            }
                        }
                    }
                }
            }
        }
    }
}

It will remove the products which have more than 3 sold_times first, then run the aggregations.

Thanks for your reply. Unfortunately this is not what I need. I really need to perform the filter after the aggs.
Lets say I have two products in category accessories:

{
   "product_name": "charger",
   "category": "accesories",
   "price": "200",
   "sold_times": "6",
   "store": "store1"
}
{
   "product_name": "headphones",
   "category": "accesories",
   "price": "150",
   "sold_times": "2",
   "store": "store1"
}

your approach would remove the charger and keep the headphones. The following aggregation would return the headphones as the most expensive product from category accessories which is not true. For this case I would need to get no result for category accessories, because the most expensive product was sold 6 times so it is not product I'm looking for. I'm not looking for cheaper products that have been sold few times. I'm really looking for those most expensive that have been sold < 3 times.

I believe its a complex problem to solve. I am actually not sure whether it can be solved from the elasticsearch end.

The best I think you can do is ordering the category according to the max sold_times. And from your application logic, remove the one which have more sold_items. You can use max aggregator with top_hits aggregator.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.