Hi All,
Relatively new to Elastic - so far loving it. Finding it challenging breaking away from relational databases and denormalizing data in a way that I can access quickly.
We run a hotel search system and are trying to build the most ideal way to store the hotel documents and search them to only return results that are available for a specific stay.
I've searched this form and SO and found a few things, but they didn't really solve what we need.
Currently we have the hotel document - we refer to a hotel as a unit (using PHP array to depict structure) - examples simplified:
$params = [
'index' => 'units',
'body' => [
'mappings' => [
'_doc' => [
'properties' => [
'name' => [
'type' => 'keyword',
],
'occupancy' => [
'type' => 'integer',
],
'location' => [
'type' => 'geo_point',
],
],
],
],
],
];
Add a record:
$params[] = [
'index' => [
'_index' => 'units',
'_type' => '_doc',
'_id' => 777777
]
];
$params[] = [
'name' => 'The Blooming Onion Motel',
'occupancy' => 4,
'location' => [
'lat' => -123.215338,
'lon' => 49.853735,
],
];
With this data, we are able to do geo searches on bounding boxes, occupancy, etc. without issue. We now want to layer in our availability and restrictions. My immediate thought was to create a separate index for availability:
$params = [
'index' => 'availability',
'body' => [
'mappings' => [
'_doc' => [
'properties' => [
'unit_id' => [
'type' => 'integer',
],
'date' => [
'type' => 'date',
'format' => 'yyyy-MM-dd',
],
'available' => [
'type' => 'boolean',
],
],
],
],
],
];
Add many records (One entry per day per unit from now() until now() + 2 years):
$params[] = [
'index' => [
'_index' => 'availability',
'_type' => '_doc',
'_id' => 777777_20181023,
]
];
$params[] = [
'unit_id' => 777777,
'date' => '2018-10-23',
'available' => true,
];
$params[] = [
'index' => [
'_index' => 'availability',
'_type' => '_doc',
'_id' => 777777_20181024,
]
];
$params[] = [
'unit_id' => 777777,
'date' => '2018-10-24',
'available' => true,
];
$params[] = [
'index' => [
'_index' => 'availability',
'_type' => '_doc',
'_id' => 777777_20181025,
]
];
$params[] = [
'unit_id' => 777777,
'date' => '2018-10-25',
'available' => false,
];
$params[] = [
'index' => [
'_index' => 'availability',
'_type' => '_doc',
'_id' => 777777_20181026,
]
];
$params[] = [
'unit_id' => 777777,
'date' => '2018-10-26',
'available' => true,
];
First off, is this the correct way to go about indexing the data? I need to create a query that can give me the hotel units that are available for the users specified trip.
If a user searched for: 2018-10-23 to 2018-10-24, it should return /units/_doc/777777
as all of the dates in their stay dates are available=true
;
If a user searched for: 2018-10-23 to 2018-10-26, it should not return anything, as the entry for /availability/_docs/777777_20181025
has available=false
;
--
I have attempted to just work from the availability index first, and cant quite seem to get things working:
{
"query": {
"bool": {
"minimum_should_match": 7,
"should" : [
{ "term" : { "date" : "2018-10-23" } },
{ "term" : { "date" : "2018-10-24" } },
{ "term" : { "date" : "2018-10-25" } },
{ "term" : { "date" : "2018-10-26" } },
{ "term" : { "date" : "2018-10-27" } },
{ "term" : { "date" : "2018-10-28" } },
{ "term" : { "available" : true } }
]
}
}
}
This isn't particularly useful as the available doesn't line up with a particular day... I imagine a bunch of separate bool queries for each day? Tried a few with not much luck...
{
"query": {
"bool": {
"must": [
{"bool": {
"should": [
{"term": {"date": "2018-10-25"}},
{"term": {"available": true}}
]
}},
{"bool": {
"should": [
{"term": {"date": "2018-10-26"}},
{"term": {"available": true}}
]
}}
]
}
}
}
{
"query": {
"bool": {
"must": [
{"bool": {
"minimum_should_match": 2,
"should": [
{"term": {"date": "2018-10-25"}},
{"term": {"available": true}}
]
}},
{"bool": {
"minimum_should_match": 2,
"should": [
{"term": {"date": "2018-10-26"}},
{"term": {"available": true}}
]
}}
]
}
}
}
--
Need some direction on where to go next:
- I could loop through all days of the proposed stay and add a condition for that day?
- Should I be denormalizing this further to make use of the parent/child with a custom
_type
on the same index? We have about 50,000 hotels with 2 years of dates to index... about 37 million availability documents. - Something I am missing completely?
Thanks for the feedback and suggestions... if you require any more info to assist in solving this, please let me know.