Can I improve this aggregation query?

Hello,

I'm quite new to Elasticsearch, but have made some progress with a query, and I'd really appreciate your feedback/ideas about how I've approached my requirement.

Introduction/Mappings

I'm writing a query to determine room availability for a number of guests over a date period. My mappings look like this:

"properties": {
    "basic_type": {
        "type": "keyword"
    },
    "room_id": {
        "type": "keyword"
    },
    "active": {
        "type": "boolean"
    },
    "is_private": {
        "type": "boolean"
    },
    "date": {
        "type": "date"
    },
    "capacity": {
        "type": "integer"
    },
    "property_id": {
        "type": "integer"
    }
}

Query

My 'steps' for the query are somewhat complicated:

  1. For each room (identified by room_id), get the minimum capacity over the date range.
  2. Then, sum these minimum capacities and bucket them based on the is_private field.
  3. [Not implemented] Get the largest of these 'bucketed sums'.

Here's what I have, with a filter based on property_id and date range (note that the room can be inactive, so I'm also filtering on the active field too):

GET rooms_test/_search?size=0
{
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "property_id": [
              "1",
              "2"
            ]
          }
        },
        {
          "term": {
            "active": {
              "value": true
            }
          }
        },
        {
          "range": {
            "date": {
              "gte": "2022-06-01",
              "lte": "2022-06-02"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "agg_by_property": {
      "terms": {
        "field": "property_id"
      },
      "aggs": {
        "agg_by_room_type": {
          "terms": {
            "field": "is_private"
          },
          "aggs": {
            "agg_capacity_by_room_type": {
              "sum_bucket": {
                "buckets_path": "agg_by_room>agg_min_space_over_period"
              }
            },
            "agg_by_room": {
              "terms": {
                "field": "room_id"
              },
              "aggs": {
                "agg_min_space_over_period": {
                  "min": {
                    "field": "capacity"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Results

For brevity, I've just pasted the first bucket, relating to property ID: 1:

{
  "key": 1,
  "doc_count": 8,
  "agg_by_room_type": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
      {
        "key": 0,
        "key_as_string": "false",
        "doc_count": 4,
        "agg_by_room": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": "MD_1",
              "doc_count": 2,
              "agg_min_space_over_period": {
                "value": 5.0
              }
            },
            {
              "key": "MD_2",
              "doc_count": 2,
              "agg_min_space_over_period": {
                "value": 10.0
              }
            }
          ]
        },
        "agg_capacity_by_room_type": {
          "value": 15.0
        }
      },
      {
        "key": 1,
        "key_as_string": "true",
        "doc_count": 4,
        "agg_by_room": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": "DR_1",
              "doc_count": 2,
              "agg_min_space_over_period": {
                "value": 2.0
              }
            },
            {
              "key": "SR_1",
              "doc_count": 2,
              "agg_min_space_over_period": {
                "value": 3.0
              }
            }
          ]
        },
        "agg_capacity_by_room_type": {
          "value": 5.0
        }
      }
    ]
  }
}

Can I improve what I have?

  1. I have those agg_capacity_by_room_type, but ideally need to get the largest of the two per property (isPrivate - true|false) and return in the result.
  2. Is there a way to reduce the output that comes back? Ultimately, I don't need the agg_min_space_over_period to come back for every room, and for some properties, this could amount to a lot of data coming over the wire.

Thanks for looking at my post. :slight_smile:

1 Like

I figured it out:

  1. I added an additional max_bucket aggregation (agg_property_capacity):
"aggs": {
  "agg_by_property": {
    "terms": {
      "field": "property_id"
    },
    "aggs": {
      "agg_by_room_type": {
        "terms": {
          "field": "is_private"
        },
        "aggs": {
          "agg_capacity_by_room_type": {
            "sum_bucket": {
              "buckets_path": "agg_by_room>agg_min_space_over_period"
            }
          },
          "agg_by_room": {
            "terms": {
              "field": "room_id"
            },
            "aggs": {
              "agg_min_space_over_period": {
                "min": {
                  "field": "capacity"
                }
              }
            }
          }
        }
      },
      "agg_property_capacity": {
        "max_bucket": {
          "buckets_path": "agg_by_room_type>agg_capacity_by_room_type"
        }
      }
    }
  }
}
  1. I reduced the amount of data returned by specifying a filter_path value:

filter_path=aggregations.agg_by_property.buckets.key,aggregations.agg_by_property.buckets.agg_property_capacity.value

this gave me the much smaller response of:

{
  "aggregations" : {
    "agg_by_property" : {
      "buckets" : [
        {
          "key" : 1,
          "agg_property_capacity" : {
            "value" : 15.0
          }
        },
        {
          "key" : 2,
          "agg_property_capacity" : {
            "value" : 7.0
          }
        }
      ]
    }
  }
}

1 Like

Nice progress! . You can use the "size" property to also reduce the amount of buckets returned in case you need only 1.

Terms aggregation | Elasticsearch Guide [7.16] | Elastic

Thanks @Gustavo_Llermaly - thanks for the idea, but in this case, I'm looking to return information against the property_id terms, so not looking to limit it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.