Hello,
I'm quite new to Elasticsearch, but have made some progress with a query, and I'd really appreciate your feedback/ideas about how I've approached my requirement.
Introduction/Mappings
I'm writing a query to determine room availability for a number of guests over a date period. My mappings look like this:
"properties": {
"basic_type": {
"type": "keyword"
},
"room_id": {
"type": "keyword"
},
"active": {
"type": "boolean"
},
"is_private": {
"type": "boolean"
},
"date": {
"type": "date"
},
"capacity": {
"type": "integer"
},
"property_id": {
"type": "integer"
}
}
Query
My 'steps' for the query are somewhat complicated:
- For each room (identified by
room_id
), get the minimum capacity over the date range. - Then, sum these minimum capacities and bucket them based on the
is_private
field. - [Not implemented] Get the largest of these 'bucketed sums'.
Here's what I have, with a filter based on property_id
and date range
(note that the room can be inactive, so I'm also filtering on the active
field too):
GET rooms_test/_search?size=0
{
"query": {
"bool": {
"filter": [
{
"terms": {
"property_id": [
"1",
"2"
]
}
},
{
"term": {
"active": {
"value": true
}
}
},
{
"range": {
"date": {
"gte": "2022-06-01",
"lte": "2022-06-02"
}
}
}
]
}
},
"aggs": {
"agg_by_property": {
"terms": {
"field": "property_id"
},
"aggs": {
"agg_by_room_type": {
"terms": {
"field": "is_private"
},
"aggs": {
"agg_capacity_by_room_type": {
"sum_bucket": {
"buckets_path": "agg_by_room>agg_min_space_over_period"
}
},
"agg_by_room": {
"terms": {
"field": "room_id"
},
"aggs": {
"agg_min_space_over_period": {
"min": {
"field": "capacity"
}
}
}
}
}
}
}
}
}
}
Results
For brevity, I've just pasted the first bucket, relating to property ID: 1:
{
"key": 1,
"doc_count": 8,
"agg_by_room_type": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 0,
"key_as_string": "false",
"doc_count": 4,
"agg_by_room": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "MD_1",
"doc_count": 2,
"agg_min_space_over_period": {
"value": 5.0
}
},
{
"key": "MD_2",
"doc_count": 2,
"agg_min_space_over_period": {
"value": 10.0
}
}
]
},
"agg_capacity_by_room_type": {
"value": 15.0
}
},
{
"key": 1,
"key_as_string": "true",
"doc_count": 4,
"agg_by_room": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "DR_1",
"doc_count": 2,
"agg_min_space_over_period": {
"value": 2.0
}
},
{
"key": "SR_1",
"doc_count": 2,
"agg_min_space_over_period": {
"value": 3.0
}
}
]
},
"agg_capacity_by_room_type": {
"value": 5.0
}
}
]
}
}
Can I improve what I have?
- I have those
agg_capacity_by_room_type
, but ideally need to get the largest of the two per property (isPrivate
-true|false
) and return in the result. - Is there a way to reduce the output that comes back? Ultimately, I don't need the
agg_min_space_over_period
to come back for every room, and for some properties, this could amount to a lot of data coming over the wire.
Thanks for looking at my post.