How to index / search hotel date availability

Hi All,

Relatively new to Elastic - so far loving it. Finding it challenging breaking away from relational databases and denormalizing data in a way that I can access quickly.

We run a hotel search system and are trying to build the most ideal way to store the hotel documents and search them to only return results that are available for a specific stay.

I've searched this form and SO and found a few things, but they didn't really solve what we need.

Currently we have the hotel document - we refer to a hotel as a unit (using PHP array to depict structure) - examples simplified:


$params = [
	'index' => 'units',
	'body' => [
	    'mappings' => [
	        '_doc' => [
	            'properties' => [
                    'name' => [
                        'type' => 'keyword',
                    ],
                    'occupancy' => [
                        'type' => 'integer',
                    ],
	                'location' => [
	                    'type' => 'geo_point',
	                ],
                ],
            ],
        ],
    ],
];

Add a record:

$params[] = [
    'index' => [
        '_index' => 'units',
        '_type' => '_doc',
        '_id' => 777777
    ]
];
$params[] = [
    'name' => 'The Blooming Onion Motel',
    'occupancy' => 4,
    'location' => [
        'lat' => -123.215338,
        'lon' => 49.853735,
    ],
];

With this data, we are able to do geo searches on bounding boxes, occupancy, etc. without issue. We now want to layer in our availability and restrictions. My immediate thought was to create a separate index for availability:


$params = [
    'index' => 'availability',
    'body' => [
        'mappings' => [
            '_doc' => [
                'properties' => [
                    'unit_id' => [
                        'type' => 'integer',
                    ],
                    'date' => [
                        'type' => 'date',
                        'format' => 'yyyy-MM-dd',
                    ],
                    'available' => [
                        'type' => 'boolean',
                    ],
                ],
            ],
        ],
    ],
];



Add many records (One entry per day per unit from now() until now() + 2 years):

$params[] = [
    'index' => [
        '_index' => 'availability',
        '_type' => '_doc',
        '_id' => 777777_20181023,
    ]
];
$params[] = [
    'unit_id' => 777777,
    'date' => '2018-10-23',
    'available' => true,
];

$params[] = [
    'index' => [
        '_index' => 'availability',
        '_type' => '_doc',
        '_id' => 777777_20181024,
    ]
];
$params[] = [
    'unit_id' => 777777,
    'date' => '2018-10-24',
    'available' => true,
];

$params[] = [
    'index' => [
        '_index' => 'availability',
        '_type' => '_doc',
        '_id' => 777777_20181025,
    ]
];
$params[] = [
    'unit_id' => 777777,
    'date' => '2018-10-25',
    'available' => false,
];

$params[] = [
    'index' => [
        '_index' => 'availability',
        '_type' => '_doc',
        '_id' => 777777_20181026,
    ]
];
$params[] = [
    'unit_id' => 777777,
    'date' => '2018-10-26',
    'available' => true,
];


First off, is this the correct way to go about indexing the data? I need to create a query that can give me the hotel units that are available for the users specified trip.

If a user searched for: 2018-10-23 to 2018-10-24, it should return /units/_doc/777777 as all of the dates in their stay dates are available=true;

If a user searched for: 2018-10-23 to 2018-10-26, it should not return anything, as the entry for /availability/_docs/777777_20181025 has available=false;

--

I have attempted to just work from the availability index first, and cant quite seem to get things working:

{
	"query": {
		"bool": {
			"minimum_should_match": 7,
      		"should" : [
				{ "term" : { "date" : "2018-10-23" } },
				{ "term" : { "date" : "2018-10-24" } },
				{ "term" : { "date" : "2018-10-25" } },
				{ "term" : { "date" : "2018-10-26" } },
				{ "term" : { "date" : "2018-10-27" } },
				{ "term" : { "date" : "2018-10-28" } },
				{ "term" : { "available" : true } }
			]
		}
	}
}

This isn't particularly useful as the available doesn't line up with a particular day... I imagine a bunch of separate bool queries for each day? Tried a few with not much luck...


{
	"query": {
		"bool": {
			"must": [
				{"bool": {

					"should": [
						{"term": {"date": "2018-10-25"}},
						{"term": {"available": true}}
					]
				}},
				{"bool": {
					"should": [
						{"term": {"date": "2018-10-26"}},
						{"term": {"available": true}}
					]
				}}
			]
		}
	}
}


{
	"query": {
		"bool": {
			"must": [
				{"bool": {
					"minimum_should_match": 2,
					"should": [
						{"term": {"date": "2018-10-25"}},
						{"term": {"available": true}}
					]
				}},
				{"bool": {
					"minimum_should_match": 2,
					"should": [
						{"term": {"date": "2018-10-26"}},
						{"term": {"available": true}}
					]
				}}
			]
		}
	}
}

--

Need some direction on where to go next:

  • I could loop through all days of the proposed stay and add a condition for that day?
  • Should I be denormalizing this further to make use of the parent/child with a custom _type on the same index? We have about 50,000 hotels with 2 years of dates to index... about 37 million availability documents.
  • Something I am missing completely?

Thanks for the feedback and suggestions... if you require any more info to assist in solving this, please let me know.

1 Like

First, consider the use of both the date_range field and range query in your solution. They might not fit, but there's a good chance it will make things easier.

What about turning things around and instead of indexing positive availability, can you index the bookings (negative availability) instead? Then you can do bool must_not range queries, I'd think.

You might be able to reduce the number of documents you need to index/query, in this fashion. Additionally, you could have an index per month to store these documents (eg. availability-201809, availability-201810, etc) which could help with future scaling/querying throughput.

What about a hotel document structure like:

  • hotel_id
  • hotel_name
  • geo_point
  • ... etc
  • booked nested doc
    • room type
    • number of beds
    • ... etc
    • booked_daterange (start, end)

Then you could query something like

{
    "query": {
        "nested": {
            "path": "booked_nested_doc",
            "query": {
                "bool": {
                    "must": [{
                        "term": {
                            "booked_nested_doc.number_of_beds": 2
                        }
                    }],
                    "must_not": [{
                        "range": {
                            "booked_nested_doc.booked_daterange": {
                                "gte": <your start date>,
                                "lte": <your end date>,
                                "relation": "within"
                            }
                        }
                    }]
                }
            }
        }
    }
}

Obviously, I'm not as close to the problem as you are, so what I've mentioned might be way out there or not applicable. Just some thoughts - hope they help!

2 Likes

Thank you very much for the reply - this thought process helps some, yes.

There are quite a few more variables that will come into play that would not fit - I just wanted to get a head start with a simple available true/false, but in reality, there are many more dimensions.

We also have a daily value for if the day supports check in only, checkout out only, etc. Another for the minimum stay days, and maximum stay days for each date.

I will play around some more in the mean time.

I am struggling to find how to query a search where I need a term match on 2 values from the same nested document - for example I need to find documents that have both date=xxxx AND available=true within the same node.

Re: Querying nested docs, my previous reply has an example query. To query nested 'subdocs' within your document, you need to use the identically-named nested query. Perhaps I'm misunderstanding.

That is a a nice approach. What you can also do is using parent-child relations for the availabilty for the room/persons/number of beds.

It will be a bit of trial and error for some use-cases, what fits you best.

I am now testing one out like this setup but instead of a nested approach, using the parent-child relation. The key for me is that with parent-child I can remove/add childs for the availabilty without updating the full hotel information. With nested you have to update alle the parts of the nested subset with each change,

Thanks for the messages. I have changed the structure to include parent and child documents on the same index. I am able to query everything individually successfully. In order to create a query for the parent units that are available for every day of a trip, I am now attempting a "has_child" but am having some unexpected results.

The mapping for /units/:

{
	"units": {
		"mappings": {
			"_doc": {
				"properties": {
					"available": {
						"type": "boolean"
					},
					"bathrooms": {
						"type": "integer"
					},
					"bedrooms": {
						"type": "integer"
					},
					"coordinates": {
						"type": "geo_point"
					},
					"date": {
						"type": "date",
						"format": "yyyy-MM-dd"
					},
					"join_field": {
						"type": "join",
						"eager_global_ordinals": true,
						"relations": {
							"unit": "availability"
						}
					}
				}
			}
		}
	}
}
id					join_field		parent_id	date		available
1					unit		
20190109-1			availability	1			2019-01-09	true
20190110-1			availability	1			2019-01-10	true
20190111-1			availability	1			2019-01-11	false
20190112-1			availability	1			2019-01-12	true

And the query (this query does not work when there is more than 1 date... 2 are shown below). In pseudo SQL, I am attempting the following:

{
   "query":{
      "bool":{
         "must":[
            {
               "has_child":{
                  "type":"availability",
                  "query":{
                     "bool":{
                        "must":[
                           {
                              "bool":{
                                 "must":[
                                    {
                                       "term":{
                                          "date":{
                                             "value":"2019-01-09"
                                          }
                                       }
                                    },
                                    {
                                       "term":{
                                          "available":{
                                             "value":true
                                          }
                                       }
                                    }
                                 ]
                              }
                           },
                           {
                              "bool":{
                                 "must":[
                                    {
                                       "term":{
                                          "date":{
                                             "value":"2019-01-10"
                                          }
                                       }
                                    },
                                    {
                                       "term":{
                                          "available":{
                                             "value":true
                                          }
                                       }
                                    }
                                 ]
                              }
                           }
                        ]
                     }
                  },
                  "ignore_unmapped":true
               }
            }
         ],
         "filter":[
            {
               "geo_bounding_box":{
                  "coordinates":{
                     "bottom":49.853735,
                     "left":-123.215338,
                     "top":50.340893,
                     "right":-122.747045
                  }
               }
            }
         ]
      }
   }
}

The logic should be similar to:

SELECT parentUnit
WHERE (childAvailability.date = '2019-01-09' AND childAvailability.available = true) AND (childAvailability.date = '2019-01-10' AND childAvailability.available = true)
AND parentUnit.geo ...

I am looping through the days of the stay, and adding a term component for each day. The above is NOT working, however, when I only include a single day, such as below, it does work... I can confirm both days are in the index, and are both available=true as they can be queried individually.

{
   "query":{
      "bool":{
         "must":[
            {
               "has_child":{
                  "type":"availability",
                  "query":{
                     "bool":{
                        "must":[
                           {
                              "bool":{
                                 "must":[
                                    {
                                       "term":{
                                          "date":{
                                             "value":"2019-01-09"
                                          }
                                       }
                                    },
                                    {
                                       "term":{
                                          "available":{
                                             "value":true
                                          }
                                       }
                                    }
                                 ]
                              }
                           }
                        ]
                     }
                  },
                  "ignore_unmapped":true
               }
            }
         ],
         "filter":[
            {
               "geo_bounding_box":{
                  "coordinates":{
                     "bottom":49.853735,
                     "left":-123.215338,
                     "top":50.340893,
                     "right":-122.747045
                  }
               }
            }
         ]
      }
   }
}

Am I formulating this the correct way to loop through days? The reason I need to do it this way is there are many other variables besides "available" that I will be adding checks for - this is just a simple example.

Any help would be greatly appreciated, thanks!

I don't know about es, but postgresql date ranges are usually used in hotel bookings (with joins & transactions but slower search-speed than es)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.