Optimizing agg on collapse

I have an index with "availability" of rooms in a set of hotels. Each record has among other things:

  • HotelCode
  • RoomType
  • arrival Date
  • Departure Date
  • Price for stay
  • Bunch of room features to filter on

The nr of records is currently around 1.5 million, and will only grow.

When people search they can search either from "hotel" or "room type".

My query itself is fine, I'm doing something like and it's fast, gives the right data I want:

  "collapse": {
    "field": "RoomType",
    "inner_hits": {
      "name": "hits",
      "size": 1,
      "sort": [{ "price": { "order": "asc" } }]
    }
  },
  "from": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "arrivalDate": {
              "gte": "2021-11-19T00:00:00",
              "lt": "2022-05-19T00:00:00"
            }
          }
        },
        {
          "range": {
            "departureDate": {
              "gte": "2021-11-26T00:00:00",
              "lt": "2022-05-26T00:00:00"
            }
          }
        },
        { "range": { "duration": { "gte": 1, "lte": 7 } } },
      ]
    }
  },
  "size": 20,

My problem now is to run aggregations on top of the result of a collapse. I know collapse itself runs after aggregations, but I need the same kind of result.

Something like the top 1 cheapest price for unique RoomType with a specific filter (date, extra filters), and then create terms buckets from that.

I tried a few things, like doing a terms query on RoomType, within that a size 1 terms on price and doing aggregates on top of that, however that generates a lot of buckets and isn't all that fast ( > 700msec in my case). what I'm really looking for is a way to first have the filtered list of top 1 cheapest rooms grouped by RoomTypes, which will result in max 1500 records, then doing aggregations on that. Is that somehow possible?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.